Hey Ryan, Thanks for raising this, and I'm very excited to see V3 being finalized!
The v3 spec for multi-arg transform only advises to use `source-ids` > instead of `source-id`. Although it is implicit and obvious that only > bucket transform can apply to multi-arg transform, it is still unclear the > order of source columns and algorithm to use to calculate the bucket value. > V3 now uses source IDs when there are multiple arguments and source IDs when there is just one. PR can be found here <https://github.com/apache/iceberg/pull/12644>. This makes the serialization deterministic without knowing the format-version, simplifying the readers/writers. After some discussion on the PR, we've decided to leave out the multi-arg bucket transform so the V3 spec can be finalized. So V3 only contains the scaffolding for multi-arg transforms. For Iceberg Geo, we are still waiting for the PR of geospatial bounds and > geospatial predicate to be merged: > https://github.com/apache/iceberg/pull/12667 I think it is a good idea to distinguish between the spec and the actual code. If we all feel comfortable with the spec, I think we could finalize it. Being comfortable also means that we know that we have a working implementation, but I don't think we have to wrap up all the loose ends before voting on the spec. At the PyIceberg side, we're also working to catch up on the V3 capabilities <https://github.com/apache/iceberg-python/issues/1818>. Having a Java release that exposes these capabilities helps, so we can do round-trip validation. Kind regards, Fokko Op wo 30 apr 2025 om 07:26 schreef Jia Yu <ji...@apache.org>: > Hi folks, > > For Iceberg Geo, we are still waiting for the PR of geospatial bounds and > geospatial predicate to be merged: > https://github.com/apache/iceberg/pull/12667 > > Should a release with core updates include this PR? > > Thanks, > Jia > > On Tue, Apr 29, 2025 at 10:21 PM Manu Zhang <owenzhang1...@gmail.com> > wrote: > >> Agree with Russell and JB that we make a "RC" release for V3 spec to test >> implementations, compatibility, etc before finalizing it. >> >> Thanks, >> Manu >> >> On Wed, Apr 30, 2025 at 12:24 PM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >>> Hi Ryan >>> >>> It sounds good. >>> >>> About multi-args transforms, with the clarification we did a couple of >>> weeks ago, I think we are good. >>> Maybe a release with the core updated before announcing spec v3 >>> officially would be a good idea ? >>> >>> Regards >>> JB >>> >>> Le mer. 30 avr. 2025 à 00:35, Ryan Blue <rdb...@gmail.com> a écrit : >>> >>>> Hi everyone, >>>> >>>> I think we’ve reached the point where it’s time to finalize and adopt >>>> the changes for Iceberg v3. We’ve been working toward this for the last few >>>> months and have now implemented the v3 features in the Java library to >>>> reduce the risk of needing changes or hitting problems (row lineage support >>>> in Spark 3.5 just went in!). We’ve also incorporated some clarifications >>>> and minor changes back into the spec from what we’ve learned. >>>> >>>> At this point, I’m confident that the spec is reasonable and correct. >>>> Thank you to everyone working on these reference implementations! >>>> >>>> The next step is to discuss any outstanding items or concerns about >>>> moving forward, and then to have a vote thread to adopt the spec. I’ll >>>> start off with a couple of items: >>>> >>>> One potential concern is that the upstream Variant spec hasn’t yet been >>>> finalized by the Parquet community, but we’ve built a full, independent >>>> implementation in Iceberg to validate the spec. I think the Parquet >>>> community is primarily waiting on getting the PRs in to have a Java >>>> reference implementation, so the risk of changes to the Variant spec is >>>> small. >>>> >>>> There’s also an on-going vote to add encryption keys in support of full >>>> table encryption that I think we want to get in. >>>> >>>> Any other items we may want to clear up? >>>> >>>> Ryan >>>> >>>