A separate repo for variant type makes sense to me. And I don't think we need to have two reference implementations ready before the adoption because it is already a released spec.
> Is the intent to release it independently of the Parquet-format spec? > I see the Variant type also has a version. IIUC, the version field in the variant spec advises how variant data is encoded. If this is the case, we should bump parquet-format version when a new encoding scheme is introduced. Best, Gang On Sat, Aug 24, 2024 at 8:43 AM Julien Le Dem <[email protected]> wrote: > (Note: I am also catching up on the threads linked in the email) > > On Fri, Aug 23, 2024 at 5:38 PM Julien Le Dem <[email protected]> wrote: > > > I am in favor of making this a separate artifact that other projects can > > depend on without pulling extra dependencies they might not want. > > What do others think about a separate repo? > > Is the intent to release it independently of the Parquet-format spec? I > > see the Variant type also has a version. > > Julien > > > > On Fri, Aug 23, 2024 at 4:31 PM Daniel Weeks <[email protected]> wrote: > > > >> Julien, > >> > >> I think there's interest in supporting multiple language implementations > >> for variant (java/scala/cpp/rust/etc), so we might what to consider > having > >> a 'parquet-varient' repository to house the spec and language > >> implementations. That might also help to keep them aligned, but open to > >> other suggestions. > >> > >> -Dan > >> > >> On Fri, Aug 23, 2024 at 3:07 PM Julien Le Dem <[email protected]> > wrote: > >> > >> > Hello, > >> > I think it is great that we are converging on a Variant type. > >> > For the parquet-java implementation, it looks like it could be as easy > >> as > >> > importing the spark implementation [1]? > >> > I'm not sure this is actually blocking anything as I'm assuming this > >> gets > >> > stored in a binary type today. > >> > Is there an existing Cpp implementation? > >> > Are there other existing types defined somewhere else solving that > same > >> > need that we should be paying attention to? (or should become > compatible > >> > with this) > >> > Best > >> > Julien > >> > [1] > >> > > >> > > >> > https://github.com/apache/spark/tree/master/common/variant/src/main/java/org/apache/spark/types/variant > >> > > >> > > >> > On Fri, Aug 23, 2024 at 2:17 PM Jacques Nadeau <[email protected]> > >> wrote: > >> > > >> > > > Do we have volunteers to implement it in Parquet-java + another > OSS > >> > > implementation? > >> > > > >> > > I don't think that should be a blocker for incorporating. I'd be > >> inclined > >> > > to do something like mark it as experimental or similar in the spec > >> until > >> > > the reference impls are done. > >> > > > >> > > On Fri, Aug 23, 2024 at 10:32 AM Micah Kornfield < > >> [email protected]> > >> > > wrote: > >> > > > >> > > > I'm in favor of this, but wondering on the logistics. Do we have > >> > > > volunteers to implement it in Parquet-java + another OSS > >> implementation > >> > > or > >> > > > are we going to bypass this requirement for now? > >> > > > > >> > > > Thanks, > >> > > > Micah > >> > > > > >> > > > On Friday, August 23, 2024, Ryan Blue <[email protected] > > > >> > > wrote: > >> > > > > >> > > > > +1 > >> > > > > > >> > > > > On Fri, Aug 23, 2024 at 12:30 PM Jacques Nadeau < > >> [email protected]> > >> > > > > wrote: > >> > > > > > >> > > > > > +1 > >> > > > > > > >> > > > > > On Fri, Aug 23, 2024 at 8:51 AM Nong Li <[email protected]> > >> wrote: > >> > > > > > > >> > > > > > > +1. > >> > > > > > > > >> > > > > > > On Fri, Aug 23, 2024 at 12:57 PM Jan Finis < > [email protected] > >> > > >> > > > wrote: > >> > > > > > > > >> > > > > > > > I would also appreciate having native Variant support in > >> > Parquet. > >> > > > > > > > > >> > > > > > > > Am Fr., 23. Aug. 2024 um 12:10 Uhr schrieb Fokko > Driesprong > >> < > >> > > > > > > > [email protected]>: > >> > > > > > > > > >> > > > > > > > > Hey Gang, > >> > > > > > > > > > >> > > > > > > > > Thanks for raising this. +1 from my end. > >> > > > > > > > > > >> > > > > > > > > For context, as Gang mentioned, when proposing to add a > >> > Variant > >> > > > > Type > >> > > > > > to > >> > > > > > > > > Iceberg <https://github.com/apache/iceberg/issues/10392 > >, > >> > one > >> > > of > >> > > > > the > >> > > > > > > > > future > >> > > > > > > > > goals was to integrate more closely with Parquet, and > >> having > >> > > the > >> > > > > spec > >> > > > > > > at > >> > > > > > > > > Parquet will help to speed this up. > >> > > > > > > > > > >> > > > > > > > > Kind regards, > >> > > > > > > > > Fokko > >> > > > > > > > > > >> > > > > > > > > Op vr 23 aug 2024 om 11:37 schreef Gábor Szádovszky < > >> > > > > > [email protected] > >> > > > > > > >: > >> > > > > > > > > > >> > > > > > > > > > Hi Gang, > >> > > > > > > > > > > >> > > > > > > > > > Thanks for bringing this up. > >> > > > > > > > > > > >> > > > > > > > > > I think that if Variant type would have come up > earlier > >> > > (before > >> > > > > > > > > > iceberg/arrow), its natural place would have been at > the > >> > file > >> > > > > > format > >> > > > > > > > > level > >> > > > > > > > > > as any other types. The communities started discussing > >> > where > >> > > it > >> > > > > > > should > >> > > > > > > > be > >> > > > > > > > > > placed because now we have different type systems at > >> > > different > >> > > > > > > places. > >> > > > > > > > > > Also, the current spec of Variant makes it more or > less > >> > > > > independent > >> > > > > > > > from > >> > > > > > > > > > the Parquet file format. > >> > > > > > > > > > However, even at Parquet level, we would need at least > >> an > >> > > > > > additional > >> > > > > > > > > > Logical type to help handle Variant type by the > systems > >> > > > > > > reading/writing > >> > > > > > > > > > Parquet. > >> > > > > > > > > > > >> > > > > > > > > > To summarize my opinion, +1 for having the whole > Variant > >> > spec > >> > > > in > >> > > > > > > > Parquet > >> > > > > > > > > > format. > >> > > > > > > > > > > >> > > > > > > > > > Cheers, > >> > > > > > > > > > Gabor > >> > > > > > > > > > > >> > > > > > > > > > Gang Wu <[email protected]> ezt írta (időpont: 2024. > >> aug. > >> > > 23., > >> > > > P, > >> > > > > > > > 11:18): > >> > > > > > > > > > > >> > > > > > > > > > > Hi, > >> > > > > > > > > > > > >> > > > > > > > > > > Apache Iceberg is adding variant type support [1][2] > >> by > >> > > > > adopting > >> > > > > > > the > >> > > > > > > > > > > variant > >> > > > > > > > > > > spec [3] from Apache Spark. As the proposal is > getting > >> > > > mature, > >> > > > > > both > >> > > > > > > > > > Iceberg > >> > > > > > > > > > > [4] > >> > > > > > > > > > > and Spark [5] communities are discussing moving the > >> > variant > >> > > > > type > >> > > > > > to > >> > > > > > > > > > Parquet > >> > > > > > > > > > > repo to avoid divergence. Moving it into Parquet > makes > >> > the > >> > > > > > variant > >> > > > > > > > spec > >> > > > > > > > > > > engine > >> > > > > > > > > > > and table format agnostic, which may encourage wider > >> > > > adoption. > >> > > > > > > > > > > > >> > > > > > > > > > > What do people from Parquet community think? > >> > > > > > > > > > > > >> > > > > > > > > > > [1] > >> > > > > > > > >> https://lists.apache.org/thread/xnyo1k66dxh0ffpg7j9f04xgos0kwc34 > >> > > > > > > > > > > [2] > >> > > > > > > > >> https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq > >> > > > > > > > > > > [3] > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > https://github.com/apache/spark/blob/d84f1a3575c4125009374521d2f179 > >> > > > > 089ebd71ad/common/variant/README.md > >> > > > > > > > > > > [4] > >> > > > > > > > >> https://lists.apache.org/thread/hopkr2f0ftoywwt9zo3jxb7n0ob5s5bw > >> > > > > > > > > > > [5] > >> > > > > > > > >> https://lists.apache.org/thread/0k5oj3mn0049fcxoxm3gx3d7r28gw4rj > >> > > > > > > > > > > > >> > > > > > > > > > > Best, > >> > > > > > > > > > > Gang > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > Ryan Blue > >> > > > > Databricks > >> > > > > > >> > > > > >> > > > >> > > >> > > >
