Hey Ryan,

Thanks for raising this, and I'm very excited to see V3 being finalized!

The v3 spec for multi-arg transform only advises to use `source-ids`
> instead of `source-id`. Although it is implicit and obvious that only
> bucket transform can apply to multi-arg transform, it is still unclear the
> order of source columns and algorithm to use to calculate the bucket value.
>

V3 now uses source IDs when there are multiple arguments and source IDs
when there is just one. PR can be found here
<https://github.com/apache/iceberg/pull/12644>. This makes the
serialization deterministic without knowing the format-version, simplifying
the readers/writers. After some discussion on the PR, we've decided to
leave out the multi-arg bucket transform so the V3 spec can be finalized.
So V3 only contains the scaffolding for multi-arg transforms.

For Iceberg Geo, we are still waiting for the PR of geospatial bounds and
> geospatial predicate to be merged:
> https://github.com/apache/iceberg/pull/12667


I think it is a good idea to distinguish between the spec and the actual
code. If we all feel comfortable with the spec, I think we could finalize
it. Being comfortable also means that we know that we have a working
implementation, but I don't think we have to wrap up all the loose ends
before voting on the spec.

At the PyIceberg side, we're also working to catch up on the V3 capabilities
<https://github.com/apache/iceberg-python/issues/1818>. Having a Java
release that exposes these capabilities helps, so we can do round-trip
validation.

Kind regards,
Fokko


Op wo 30 apr 2025 om 07:26 schreef Jia Yu <ji...@apache.org>:

> Hi folks,
>
> For Iceberg Geo, we are still waiting for the PR of geospatial bounds and
> geospatial predicate to be merged:
> https://github.com/apache/iceberg/pull/12667
>
> Should a release with core updates include this PR?
>
> Thanks,
> Jia
>
> On Tue, Apr 29, 2025 at 10:21 PM Manu Zhang <owenzhang1...@gmail.com>
> wrote:
>
>> Agree with Russell and JB that we make a "RC" release for V3 spec to test
>> implementations, compatibility, etc before finalizing it.
>>
>> Thanks,
>> Manu
>>
>> On Wed, Apr 30, 2025 at 12:24 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi Ryan
>>>
>>> It sounds good.
>>>
>>> About multi-args transforms, with the clarification we did a couple of
>>> weeks ago, I think we are good.
>>> Maybe a release with the core updated before announcing spec v3
>>> officially would be a good idea ?
>>>
>>> Regards
>>> JB
>>>
>>> Le mer. 30 avr. 2025 à 00:35, Ryan Blue <rdb...@gmail.com> a écrit :
>>>
>>>> Hi everyone,
>>>>
>>>> I think we’ve reached the point where it’s time to finalize and adopt
>>>> the changes for Iceberg v3. We’ve been working toward this for the last few
>>>> months and have now implemented the v3 features in the Java library to
>>>> reduce the risk of needing changes or hitting problems (row lineage support
>>>> in Spark 3.5 just went in!). We’ve also incorporated some clarifications
>>>> and minor changes back into the spec from what we’ve learned.
>>>>
>>>> At this point, I’m confident that the spec is reasonable and correct.
>>>> Thank you to everyone working on these reference implementations!
>>>>
>>>> The next step is to discuss any outstanding items or concerns about
>>>> moving forward, and then to have a vote thread to adopt the spec. I’ll
>>>> start off with a couple of items:
>>>>
>>>> One potential concern is that the upstream Variant spec hasn’t yet been
>>>> finalized by the Parquet community, but we’ve built a full, independent
>>>> implementation in Iceberg to validate the spec. I think the Parquet
>>>> community is primarily waiting on getting the PRs in to have a Java
>>>> reference implementation, so the risk of changes to the Variant spec is
>>>> small.
>>>>
>>>> There’s also an on-going vote to add encryption keys in support of full
>>>> table encryption that I think we want to get in.
>>>>
>>>> Any other items we may want to clear up?
>>>>
>>>> Ryan
>>>>
>>>

Reply via email to