Hi Ryan.

Thank for starting this. 

I share the same concern as Russell regarding the recent discussion about 
`metadata.json.gz`. I think it's a good time to clarify the behavior and 
perhaps allow for additional compression algorithms here. We can start a 
seperate discuss thread if needed.

> At the PyIceberg side, we're also working to catch up on the V3 capabilities 
> <https://github.com/apache/iceberg-python/issues/1818>. Having a Java release 
> that exposes these capabilities helps, so we can do round-trip validation.

Agreed. We can begin work on the iceberg-rust side after the Java release.

On Wed, Apr 30, 2025, at 13:47, Fokko Driesprong wrote:
> Hey Ryan,
> 
> Thanks for raising this, and I'm very excited to see V3 being finalized!
> 
>> The v3 spec for multi-arg transform only advises to use `source-ids` instead 
>> of `source-id`. Although it is implicit and obvious that only bucket 
>> transform can apply to multi-arg transform, it is still unclear the order of 
>> source columns and algorithm to use to calculate the bucket value.
> 
> V3 now uses source IDs when there are multiple arguments and source IDs when 
> there is just one. PR can be found here 
> <https://github.com/apache/iceberg/pull/12644>. This makes the serialization 
> deterministic without knowing the format-version, simplifying the 
> readers/writers. After some discussion on the PR, we've decided to leave out 
> the multi-arg bucket transform so the V3 spec can be finalized. So V3 only 
> contains the scaffolding for multi-arg transforms.
> 
>> For Iceberg Geo, we are still waiting for the PR of geospatial bounds and 
>> geospatial predicate to be merged: 
>> https://github.com/apache/iceberg/pull/12667
> 
> I think it is a good idea to distinguish between the spec and the actual 
> code. If we all feel comfortable with the spec, I think we could finalize it. 
> Being comfortable also means that we know that we have a working 
> implementation, but I don't think we have to wrap up all the loose ends 
> before voting on the spec.
> 
> At the PyIceberg side, we're also working to catch up on the V3 capabilities 
> <https://github.com/apache/iceberg-python/issues/1818>. Having a Java release 
> that exposes these capabilities helps, so we can do round-trip validation.
> 
> Kind regards,
> Fokko
> 
> 
> Op wo 30 apr 2025 om 07:26 schreef Jia Yu <ji...@apache.org>:
>> Hi folks,
>> 
>> For Iceberg Geo, we are still waiting for the PR of geospatial bounds and 
>> geospatial predicate to be merged: 
>> https://github.com/apache/iceberg/pull/12667
>> 
>> Should a release with core updates include this PR?
>> 
>> Thanks,
>> Jia
>> 
>> On Tue, Apr 29, 2025 at 10:21 PM Manu Zhang <owenzhang1...@gmail.com> wrote:
>>> Agree with Russell and JB that we make a "RC" release for V3 spec to test 
>>> implementations, compatibility, etc before finalizing it.
>>> 
>>> Thanks,
>>> Manu
>>> 
>>> On Wed, Apr 30, 2025 at 12:24 PM Jean-Baptiste Onofré <j...@nanthrax.net> 
>>> wrote:
>>>> Hi Ryan
>>>> 
>>>> It sounds good. 
>>>> 
>>>> About multi-args transforms, with the clarification we did a couple of 
>>>> weeks ago, I think we are good. 
>>>> Maybe a release with the core updated before announcing spec v3 officially 
>>>> would be a good idea ?
>>>> 
>>>> Regards
>>>> JB
>>>> 
>>>> Le mer. 30 avr. 2025 à 00:35, Ryan Blue <rdb...@gmail.com> a écrit :
>>>>> Hi everyone,
>>>>> 
>>>>> I think we’ve reached the point where it’s time to finalize and adopt the 
>>>>> changes for Iceberg v3. We’ve been working toward this for the last few 
>>>>> months and have now implemented the v3 features in the Java library to 
>>>>> reduce the risk of needing changes or hitting problems (row lineage 
>>>>> support in Spark 3.5 just went in!). We’ve also incorporated some 
>>>>> clarifications and minor changes back into the spec from what we’ve 
>>>>> learned.
>>>>> 
>>>>> At this point, I’m confident that the spec is reasonable and correct. 
>>>>> Thank you to everyone working on these reference implementations!
>>>>> 
>>>>> The next step is to discuss any outstanding items or concerns about 
>>>>> moving forward, and then to have a vote thread to adopt the spec. I’ll 
>>>>> start off with a couple of items:
>>>>> 
>>>>> One potential concern is that the upstream Variant spec hasn’t yet been 
>>>>> finalized by the Parquet community, but we’ve built a full, independent 
>>>>> implementation in Iceberg to validate the spec. I think the Parquet 
>>>>> community is primarily waiting on getting the PRs in to have a Java 
>>>>> reference implementation, so the risk of changes to the Variant spec is 
>>>>> small.
>>>>> 
>>>>> There’s also an on-going vote to add encryption keys in support of full 
>>>>> table encryption that I think we want to get in.
>>>>> 
>>>>> Any other items we may want to clear up?
>>>>> 
>>>>> Ryan
>>>>> 
>>>>> 

Xuanwo

https://xuanwo.io/

Reply via email to