Thanks Russell and Aihua for pushing Variant support! Given the differences between the supported types and the lack of interest from the other project, I think it is reasonable to duplicate the specification to our repository. I would give very strong emphasis on sticking to the Spark spec as much as possible, to keep compatibility as much as possible. Maybe even revert to a shared specification if the situation changes.
Thanks, Peter Aihua Xu <aihu...@gmail.com> ezt írta (időpont: 2024. aug. 13., K, 19:52): > Thanks Russell for bringing this up. > > This is the main blocker to move forward with the Variant support in > Iceberg and hopefully we can have a consensus. To me, I also feel it makes > more sense to move the spec into Iceberg rather than Spark engine owns it > and we try to keep it compatible with Spark spec. > > Thanks, > Aihua > > On Mon, Aug 12, 2024 at 6:50 PM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> Hi Y’all, >> >> We’ve hit a bit of a roadblock with the Variant Proposal, while we were >> hoping to move the Variant and Shredding specifications from Spark into >> Iceberg there doesn’t seem to be a lot of interest in that. Unfortunately, >> I think we have a number of issues with just linking to the Spark project >> directly from within Iceberg and *I believe we need to copy the >> specifications into our repository*. >> >> There are a few reasons why i think this is necessary >> >> First, we have a divergence of types already. The Spark Specification >> already includes types which Iceberg has no definition for (19, 20 >> <https://github.com/apache/spark/blob/master/common/variant/README.md#encoding-types> >> - Interval Types) and Iceberg already has a type which is not included >> within the Spark Specification (Time) and will soon have more with >> TimestampNS, and Geo. >> >> Second, We would like to make sure that Spark is not a hard dependency >> for other engines. We are working with several implementers of the Iceberg >> spec and it has previously been agreed that it would be best if the source >> of truth for Variant existed in an engine and file format neutral location. >> The Iceberg project has a good open model of governance and, as we have >> seen so far discussing Variant >> <https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq>, open >> and active collaboration. This would also help as we can strictly version >> our changes in-line with the rest of the Iceberg spec. >> >> Third, The Shredding spec is not quite finished and requires some group >> analysis and discussion before we commit it. I think again the Iceberg >> community is probably the right place for this to happen as we have already >> started discussions here on these topics. >> >> For these reasons I think we should go with a direct copy of the existing >> specification from the Spark Project and move ahead with our discussions >> and modifications within Iceberg. That said, *I do not want to diverge >> if possible from the Spark proposal*. For example, although we do not >> use the Interval types above, I think we should not reuse those type ids >> within our spec. Iceberg's Variant Spec types 19 and 20 would remain unused >> along with any other types we think are not applicable. We should strive >> whenever possible to allow for compatibility. >> >> In the interest of moving forward with this proposal I am hoping to see >> if anyone in the community objects to this plan going forward or has a >> better alternative. >> >> As always I am thankful for your time and am eager to hear back from >> everyone, >> Russ >> >>