Hi Fokko

Thanks for your reply !

I agree with your points.

1. About "housing" of Iceberg components, I fully agree. It's always
better to have it on involved projects (from a maintenance standpoint
and governance). Sometime it's not easy (look about Iceberg engines,
or Apache Camel and all components), but the preferred approach. I
plan to work on a Iceberg component for Apache Camel, it would be at
Apache Camel. So for Iceberg 2.x, let's focus on Kafka Connect related
topics as it's part of Iceberg.

2. About the roadmap page, I think we should:
2.1. Directly provide the list of GitHub milestones and link there
2.2. Mention the release cadence as in the related #9666 PR (thanks
for your comment here :) )

3. I agree to start a discussion following Dan's thread about the
proposal process (GitHub issue with proposal tag, etc).

Thanks !
Regards
JB

On Wed, Mar 13, 2024 at 9:11 AM Fokko Driesprong <fo...@apache.org> wrote:
>
> Hey JB,
>
> Thanks for raising this. Sorry for the late reply, but I was OOO last week. I 
> think in general the progress is being kept on the spec itself. Also, some 
> features are already available (default values in Python, and nanosecond 
> timestamps are being worked on in Java), and I would rather expose these 
> features already using a feature flag, rather than waiting for the spec to be 
> finalized. It would be nice to finalize the Spec at some point, to allow 
> engines that support Iceberg to say that we support up to Spec v3.
>
>> * Data Injection (e.g. Kafka Connect sink)
>
>
> I'd rather organize these integrations bottom-up than top-down. We only want 
> to ensure that similar solutions are being developed in parallel. For Kafka 
> Connect it is part of the Iceberg repository, but it makes more sense to push 
> this to the project itself (for example in Beam) when possible. With Hive 
> 4.0.0 the Iceberg integration will also be moved to the Hive side, so that's 
> also a good opportunity to remove it from the Iceberg repository.
>
>> We have this page https://iceberg.apache.org/roadmap/. I'm not sureit's 
>> actually up to date.
>
>
> It is very outdated, and I believe it is best to remove it for now (for now). 
> Every project is adopting the V3 spec already (default values in PyIceberg, 
> nanoseconds in Java).
>
>> I also proposed this https://github.com/apache/iceberg/pull/9666 to give a 
>> rough idea.
>
>
> We're almost doing a release (roughly) every quarter and I agree it is good 
> to establish that as a cadence. I've left a small comment on the PR.
>
>> That's a raw discussion start, I propose to create a GitHub "Discussion" 
>> issue (flagged with 2.0.0 milestone) for each topic where we have consensus.
>
>
> There is already a 2.0.0 milestone, and we should use it to indicate that we 
> want to get into 2.0.0. I'm open to creating a Discussion issue if more 
> people think this is a good idea (typically this was discussed on the mailing 
> list within the ASF context).
>
> Thanks,
> Fokko
>
> Op ma 11 mrt 2024 om 07:34 schreef Jean-Baptiste Onofré <j...@nanthrax.net>:
>>
>> Hi folks,
>>
>> I forgot to provide some background about this thread. The reason for
>> this thread is because I think it's important to give visibility to
>> our community, not necessarily with strong dates, but more about when
>> roughly what could be expected. Without this, it's pretty hard for our
>> users to define their own roadmap.
>>
>> We have this page https://iceberg.apache.org/roadmap/. I'm not sure
>> it's actually up to date.
>> I also proposed this https://github.com/apache/iceberg/pull/9666 to
>> give a rough idea.
>>
>> So I think it would be good to have a consensus about the roadmap and
>> update roadmap page on the website to have some visibility (it would
>> be helpful for us too :)).
>>
>> Thoughts ?
>>
>> Regards
>> JB
>>
>> On Thu, Mar 7, 2024 at 7:43 PM Jean-Baptiste Onofré <j...@nanthrax.net> 
>> wrote:
>> >
>> > Hi Ryan
>> >
>> > Yeah I agree to separate discussions on each topic. Actually that was my 
>> > intention ;)
>> >
>> > I just wanted to have thoughts from everyone about roadmap/timeline.
>> >
>> > Jack and I will start a dedicated thread about REST catalog.
>> >
>> > Thanks !
>> >
>> > Regards
>> > JB
>> >
>> >
>> > Le jeu. 7 mars 2024 à 18:34, Ryan Blue <b...@tabular.io> a écrit :
>> >>
>> >> Hi JB,
>> >>
>> >> Specs and libraries are versioned separately. In fact, the v2 spec has 
>> >> already been voted on and adopted. The next spec version is v3.
>> >>
>> >> I think we do want to get to a 2.0 of the Java library sometime soon to 
>> >> drop some deprecated APIs and clean up a few things, but I don't think 
>> >> that we're quite ready to take that on right now, which is likely why 
>> >> there has been little activity on this thread.
>> >>
>> >> I also think that most of these things are going to be discussion points 
>> >> that we cover as separate topics, rather than one big "everything 2.0" 
>> >> thread. It just doesn't seem manageable to me to cover them all at once. 
>> >> Maybe that's just me though.
>> >>
>> >> Ryan
>> >>
>> >> On Thu, Mar 7, 2024 at 7:49 AM Jean-Baptiste Onofré <j...@nanthrax.net> 
>> >> wrote:
>> >>>
>> >>> Hi guys,
>> >>>
>> >>> Let me ping again on this thread ;)
>> >>>
>> >>> I think it would be great to give some visibility to the community,
>> >>> especially about Spec v3 and Iceberg 2.0.0.
>> >>>
>> >>> Any comments about Spec V2 / Iceberg 2.0.0 ?
>> >>>
>> >>> Thanks !
>> >>> Regards
>> >>> JB
>> >>>
>> >>> On Fri, Feb 16, 2024 at 4:52 PM Jean-Baptiste Onofré <j...@nanthrax.net> 
>> >>> wrote:
>> >>> >
>> >>> > Hi guys,
>> >>> >
>> >>> > During the last community meeting, we started to quickly discuss 
>> >>> > Iceberg 2.0.
>> >>> > I was quite surprised it came during the community meeting because I
>> >>> > don't remember having a previous discussion (on the mailing list)
>> >>> > about that.
>> >>> >
>> >>> > So, I would like to have to start an open discussion about our
>> >>> > community driven roadmap.
>> >>> >
>> >>> > I see the following topics that should be discussed (maybe as proposed
>> >>> > by Brian we can have corresponding GitHub issues tagged with
>> >>> > "discussion" flag). That's open questions, feel free to add points I
>> >>> > missed:
>> >>> >
>> >>> > * Spec v3
>> >>> >     We have the discussion about ts_nanosecond, and other enhancements
>> >>> > in the spec. Do we plan to have Iceberg 2.0 with Spec v3 ? What do we
>> >>> > plan to include in spec v3 as a target ?
>> >>> > * Catalogs
>> >>> >     We have a consensus that we have too many catalogs, especially
>> >>> > with different capabilities/issues. Jack already started the
>> >>> > discussion to deprecate DynamoDBCatalog. The discussion is:
>> >>> >      - Where do we want the catalog to leave (repository) ?
>> >>> >      - What catalogs do we want to deprecate (HadoopCatalog for 
>> >>> > instance :)) ?
>> >>> >      - Do we want to have the REST Catalog as a kind of façade for
>> >>> > other catalog/backend ?
>> >>> > * REST Catalog
>> >>> >    If we want to use the REST Catalog as a façade, what are the
>> >>> > requirements to have it even more pluggable for both backend (other
>> >>> > catalogs) and the REST itself (authentication/authorization, runtime,
>> >>> > etc) ? Jack also started a discussion about permission on the REST
>> >>> > catalog.
>> >>> > * Engines
>> >>> >    What engines (and version) do we plan to still support ? What new
>> >>> > engines do we plan (for instance I can work on an Apache Beam and an
>> >>> > Apache Karaf powered engine) ?
>> >>> > * Data file formats / Table formats
>> >>> >    Do we plan to add/remove/update data file formats for 2.0 (Parquet,
>> >>> > ORC, ...) ?
>> >>> >    Same question about table formats ? Do we plan a kind of "tool" to
>> >>> > move data from table formats to Iceberg ?
>> >>> > * Data Injection (e.g. Kafka Connect sink)
>> >>> >    Iceberg 1.5.0 will include the first bricks of Kafka Connect, new
>> >>> > ones will come with 1.6+.
>> >>> >    What do we plan for Iceberg 2.0 on this front ? Do we plan an
>> >>> > additional layer next to Kafka Connect (for instance why not provide
>> >>> > an Apache Camel for read/write data to Iceberg) ?
>> >>> > * Rough date: depending on all previous points (and maybe others :)),
>> >>> > when do we target 2.0.0 ?
>> >>> >
>> >>> > That's a raw discussion start, I propose to create a GitHub
>> >>> > "Discussion" issue (flagged with 2.0.0 milestone) for each topic where
>> >>> > we have consensus.
>> >>> >
>> >>> > Thoughts ?
>> >>> >
>> >>> > Regards
>> >>> > JB
>> >>
>> >>
>> >>
>> >> --
>> >> Ryan Blue
>> >> Tabular

Reply via email to