Thanks Russell and Aihua for pushing Variant support!

Given the differences between the supported types and the lack of interest
from the other project, I think it is reasonable to duplicate the
specification to our repository.
I would give very strong emphasis on sticking to the Spark spec as much as
possible, to keep compatibility as much as possible. Maybe even revert to a
shared specification if the situation changes.

Thanks,
Peter

Aihua Xu <aihu...@gmail.com> ezt írta (időpont: 2024. aug. 13., K, 19:52):

> Thanks Russell for bringing this up.
>
> This is the main blocker to move forward with the Variant support in
> Iceberg and hopefully we can have a consensus. To me, I also feel it makes
> more sense to move the spec into Iceberg rather than Spark engine owns it
> and we try to keep it compatible with Spark spec.
>
> Thanks,
> Aihua
>
> On Mon, Aug 12, 2024 at 6:50 PM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> Hi Y’all,
>>
>> We’ve hit a bit of a roadblock with the Variant Proposal, while we were
>> hoping to move the Variant and Shredding specifications from Spark into
>> Iceberg there doesn’t seem to be a lot of interest in that. Unfortunately,
>> I think we have a number of issues with just linking to the Spark project
>> directly from within Iceberg and *I believe we need to copy the
>> specifications into our repository*.
>>
>> There are a few reasons why i think this is necessary
>>
>> First, we have a divergence of types already. The Spark Specification
>> already includes types which Iceberg has no definition for (19, 20
>> <https://github.com/apache/spark/blob/master/common/variant/README.md#encoding-types>
>> - Interval Types) and Iceberg already has a type which is not included
>> within the Spark Specification (Time) and will soon have more with
>> TimestampNS, and Geo.
>>
>> Second, We would like to make sure that Spark is not a hard dependency
>> for other engines. We are working with several implementers of the Iceberg
>> spec and it has previously been agreed that it would be best if the source
>> of truth for Variant existed in an engine and file format neutral location.
>> The Iceberg project has a good open model of governance and, as we have
>> seen so far discussing Variant
>> <https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq>, open
>> and active collaboration. This would also help as we can strictly version
>> our changes in-line with the rest of the Iceberg spec.
>>
>> Third, The Shredding spec is not quite finished and requires some group
>> analysis and discussion before we commit it. I think again the Iceberg
>> community is probably the right place for this to happen as we have already
>> started discussions here on these topics.
>>
>> For these reasons I think we should go with a direct copy of the existing
>> specification from the Spark Project and move ahead with our discussions
>> and modifications within Iceberg. That said, *I do not want to diverge
>> if possible from the Spark proposal*. For example, although we do not
>> use the Interval types above, I think we should not reuse those type ids
>> within our spec. Iceberg's Variant Spec types 19 and 20 would remain unused
>> along with any other types we think are not applicable. We should strive
>> whenever possible to allow for compatibility.
>>
>> In the interest of moving forward with this proposal I am hoping to see
>> if anyone in the community objects to this plan going forward or has a
>> better alternative.
>>
>> As always I am thankful for your time and am eager to hear back from
>> everyone,
>> Russ
>>
>>

Reply via email to