Hi Y’all,

We’ve hit a bit of a roadblock with the Variant Proposal, while we were
hoping to move the Variant and Shredding specifications from Spark into
Iceberg there doesn’t seem to be a lot of interest in that. Unfortunately,
I think we have a number of issues with just linking to the Spark project
directly from within Iceberg and *I believe we need to copy the
specifications into our repository*.

There are a few reasons why i think this is necessary

First, we have a divergence of types already. The Spark Specification
already includes types which Iceberg has no definition for (19, 20
<https://github.com/apache/spark/blob/master/common/variant/README.md#encoding-types>
- Interval Types) and Iceberg already has a type which is not included
within the Spark Specification (Time) and will soon have more with
TimestampNS, and Geo.

Second, We would like to make sure that Spark is not a hard dependency for
other engines. We are working with several implementers of the Iceberg spec
and it has previously been agreed that it would be best if the source of
truth for Variant existed in an engine and file format neutral location.
The Iceberg project has a good open model of governance and, as we have
seen so far discussing Variant
<https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq>, open
and active collaboration. This would also help as we can strictly version
our changes in-line with the rest of the Iceberg spec.

Third, The Shredding spec is not quite finished and requires some group
analysis and discussion before we commit it. I think again the Iceberg
community is probably the right place for this to happen as we have already
started discussions here on these topics.

For these reasons I think we should go with a direct copy of the existing
specification from the Spark Project and move ahead with our discussions
and modifications within Iceberg. That said, *I do not want to diverge if
possible from the Spark proposal*. For example, although we do not use the
Interval types above, I think we should not reuse those type ids within our
spec. Iceberg's Variant Spec types 19 and 20 would remain unused along with
any other types we think are not applicable. We should strive whenever
possible to allow for compatibility.

In the interest of moving forward with this proposal I am hoping to see if
anyone in the community objects to this plan going forward or has a better
alternative.

As always I am thankful for your time and am eager to hear back from
everyone,
Russ

Reply via email to