+1 to copy the spec into our repository. I think the best way to keep compatibility is building integration tests.
Thanks, Manu On Wed, Aug 14, 2024 at 8:27 PM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Thanks Russell and Aihua for pushing Variant support! > > Given the differences between the supported types and the lack of interest > from the other project, I think it is reasonable to duplicate the > specification to our repository. > I would give very strong emphasis on sticking to the Spark spec as much as > possible, to keep compatibility as much as possible. Maybe even revert to a > shared specification if the situation changes. > > Thanks, > Peter > > Aihua Xu <aihu...@gmail.com> ezt írta (időpont: 2024. aug. 13., K, 19:52): > >> Thanks Russell for bringing this up. >> >> This is the main blocker to move forward with the Variant support in >> Iceberg and hopefully we can have a consensus. To me, I also feel it makes >> more sense to move the spec into Iceberg rather than Spark engine owns it >> and we try to keep it compatible with Spark spec. >> >> Thanks, >> Aihua >> >> On Mon, Aug 12, 2024 at 6:50 PM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> Hi Y’all, >>> >>> We’ve hit a bit of a roadblock with the Variant Proposal, while we were >>> hoping to move the Variant and Shredding specifications from Spark into >>> Iceberg there doesn’t seem to be a lot of interest in that. Unfortunately, >>> I think we have a number of issues with just linking to the Spark project >>> directly from within Iceberg and *I believe we need to copy the >>> specifications into our repository*. >>> >>> There are a few reasons why i think this is necessary >>> >>> First, we have a divergence of types already. The Spark Specification >>> already includes types which Iceberg has no definition for (19, 20 >>> <https://github.com/apache/spark/blob/master/common/variant/README.md#encoding-types> >>> - Interval Types) and Iceberg already has a type which is not included >>> within the Spark Specification (Time) and will soon have more with >>> TimestampNS, and Geo. >>> >>> Second, We would like to make sure that Spark is not a hard dependency >>> for other engines. We are working with several implementers of the Iceberg >>> spec and it has previously been agreed that it would be best if the source >>> of truth for Variant existed in an engine and file format neutral location. >>> The Iceberg project has a good open model of governance and, as we have >>> seen so far discussing Variant >>> <https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq>, >>> open and active collaboration. This would also help as we can strictly >>> version our changes in-line with the rest of the Iceberg spec. >>> >>> Third, The Shredding spec is not quite finished and requires some group >>> analysis and discussion before we commit it. I think again the Iceberg >>> community is probably the right place for this to happen as we have already >>> started discussions here on these topics. >>> >>> For these reasons I think we should go with a direct copy of the >>> existing specification from the Spark Project and move ahead with our >>> discussions and modifications within Iceberg. That said, *I do not want >>> to diverge if possible from the Spark proposal*. For example, although >>> we do not use the Interval types above, I think we should not reuse >>> those type ids within our spec. Iceberg's Variant Spec types 19 and 20 >>> would remain unused along with any other types we think are not applicable. >>> We should strive whenever possible to allow for compatibility. >>> >>> In the interest of moving forward with this proposal I am hoping to see >>> if anyone in the community objects to this plan going forward or has a >>> better alternative. >>> >>> As always I am thankful for your time and am eager to hear back from >>> everyone, >>> Russ >>> >>>