+1 to copy the spec into our repository. I think the best way to keep
compatibility is building integration tests.

Thanks,
Manu

On Wed, Aug 14, 2024 at 8:27 PM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> Thanks Russell and Aihua for pushing Variant support!
>
> Given the differences between the supported types and the lack of interest
> from the other project, I think it is reasonable to duplicate the
> specification to our repository.
> I would give very strong emphasis on sticking to the Spark spec as much as
> possible, to keep compatibility as much as possible. Maybe even revert to a
> shared specification if the situation changes.
>
> Thanks,
> Peter
>
> Aihua Xu <aihu...@gmail.com> ezt írta (időpont: 2024. aug. 13., K, 19:52):
>
>> Thanks Russell for bringing this up.
>>
>> This is the main blocker to move forward with the Variant support in
>> Iceberg and hopefully we can have a consensus. To me, I also feel it makes
>> more sense to move the spec into Iceberg rather than Spark engine owns it
>> and we try to keep it compatible with Spark spec.
>>
>> Thanks,
>> Aihua
>>
>> On Mon, Aug 12, 2024 at 6:50 PM Russell Spitzer <
>> russell.spit...@gmail.com> wrote:
>>
>>> Hi Y’all,
>>>
>>> We’ve hit a bit of a roadblock with the Variant Proposal, while we were
>>> hoping to move the Variant and Shredding specifications from Spark into
>>> Iceberg there doesn’t seem to be a lot of interest in that. Unfortunately,
>>> I think we have a number of issues with just linking to the Spark project
>>> directly from within Iceberg and *I believe we need to copy the
>>> specifications into our repository*.
>>>
>>> There are a few reasons why i think this is necessary
>>>
>>> First, we have a divergence of types already. The Spark Specification
>>> already includes types which Iceberg has no definition for (19, 20
>>> <https://github.com/apache/spark/blob/master/common/variant/README.md#encoding-types>
>>> - Interval Types) and Iceberg already has a type which is not included
>>> within the Spark Specification (Time) and will soon have more with
>>> TimestampNS, and Geo.
>>>
>>> Second, We would like to make sure that Spark is not a hard dependency
>>> for other engines. We are working with several implementers of the Iceberg
>>> spec and it has previously been agreed that it would be best if the source
>>> of truth for Variant existed in an engine and file format neutral location.
>>> The Iceberg project has a good open model of governance and, as we have
>>> seen so far discussing Variant
>>> <https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq>,
>>> open and active collaboration. This would also help as we can strictly
>>> version our changes in-line with the rest of the Iceberg spec.
>>>
>>> Third, The Shredding spec is not quite finished and requires some group
>>> analysis and discussion before we commit it. I think again the Iceberg
>>> community is probably the right place for this to happen as we have already
>>> started discussions here on these topics.
>>>
>>> For these reasons I think we should go with a direct copy of the
>>> existing specification from the Spark Project and move ahead with our
>>> discussions and modifications within Iceberg. That said, *I do not want
>>> to diverge if possible from the Spark proposal*. For example, although
>>> we do not use the Interval types above, I think we should not reuse
>>> those type ids within our spec. Iceberg's Variant Spec types 19 and 20
>>> would remain unused along with any other types we think are not applicable.
>>> We should strive whenever possible to allow for compatibility.
>>>
>>> In the interest of moving forward with this proposal I am hoping to see
>>> if anyone in the community objects to this plan going forward or has a
>>> better alternative.
>>>
>>> As always I am thankful for your time and am eager to hear back from
>>> everyone,
>>> Russ
>>>
>>>

Reply via email to