Hello, I think it is great that we are converging on a Variant type. For the parquet-java implementation, it looks like it could be as easy as importing the spark implementation [1]? I'm not sure this is actually blocking anything as I'm assuming this gets stored in a binary type today. Is there an existing Cpp implementation? Are there other existing types defined somewhere else solving that same need that we should be paying attention to? (or should become compatible with this) Best Julien [1] https://github.com/apache/spark/tree/master/common/variant/src/main/java/org/apache/spark/types/variant
On Fri, Aug 23, 2024 at 2:17 PM Jacques Nadeau <jacq...@apache.org> wrote: > > Do we have volunteers to implement it in Parquet-java + another OSS > implementation? > > I don't think that should be a blocker for incorporating. I'd be inclined > to do something like mark it as experimental or similar in the spec until > the reference impls are done. > > On Fri, Aug 23, 2024 at 10:32 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > I'm in favor of this, but wondering on the logistics. Do we have > > volunteers to implement it in Parquet-java + another OSS implementation > or > > are we going to bypass this requirement for now? > > > > Thanks, > > Micah > > > > On Friday, August 23, 2024, Ryan Blue <b...@databricks.com.invalid> > wrote: > > > > > +1 > > > > > > On Fri, Aug 23, 2024 at 12:30 PM Jacques Nadeau <jacq...@apache.org> > > > wrote: > > > > > > > +1 > > > > > > > > On Fri, Aug 23, 2024 at 8:51 AM Nong Li <non...@gmail.com> wrote: > > > > > > > > > +1. > > > > > > > > > > On Fri, Aug 23, 2024 at 12:57 PM Jan Finis <jpfi...@gmail.com> > > wrote: > > > > > > > > > > > I would also appreciate having native Variant support in Parquet. > > > > > > > > > > > > Am Fr., 23. Aug. 2024 um 12:10 Uhr schrieb Fokko Driesprong < > > > > > > fo...@apache.org>: > > > > > > > > > > > > > Hey Gang, > > > > > > > > > > > > > > Thanks for raising this. +1 from my end. > > > > > > > > > > > > > > For context, as Gang mentioned, when proposing to add a Variant > > > Type > > > > to > > > > > > > Iceberg <https://github.com/apache/iceberg/issues/10392>, one > of > > > the > > > > > > > future > > > > > > > goals was to integrate more closely with Parquet, and having > the > > > spec > > > > > at > > > > > > > Parquet will help to speed this up. > > > > > > > > > > > > > > Kind regards, > > > > > > > Fokko > > > > > > > > > > > > > > Op vr 23 aug 2024 om 11:37 schreef Gábor Szádovszky < > > > > ga...@apache.org > > > > > >: > > > > > > > > > > > > > > > Hi Gang, > > > > > > > > > > > > > > > > Thanks for bringing this up. > > > > > > > > > > > > > > > > I think that if Variant type would have come up earlier > (before > > > > > > > > iceberg/arrow), its natural place would have been at the file > > > > format > > > > > > > level > > > > > > > > as any other types. The communities started discussing where > it > > > > > should > > > > > > be > > > > > > > > placed because now we have different type systems at > different > > > > > places. > > > > > > > > Also, the current spec of Variant makes it more or less > > > independent > > > > > > from > > > > > > > > the Parquet file format. > > > > > > > > However, even at Parquet level, we would need at least an > > > > additional > > > > > > > > Logical type to help handle Variant type by the systems > > > > > reading/writing > > > > > > > > Parquet. > > > > > > > > > > > > > > > > To summarize my opinion, +1 for having the whole Variant spec > > in > > > > > > Parquet > > > > > > > > format. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Gabor > > > > > > > > > > > > > > > > Gang Wu <ust...@gmail.com> ezt írta (időpont: 2024. aug. > 23., > > P, > > > > > > 11:18): > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > Apache Iceberg is adding variant type support [1][2] by > > > adopting > > > > > the > > > > > > > > > variant > > > > > > > > > spec [3] from Apache Spark. As the proposal is getting > > mature, > > > > both > > > > > > > > Iceberg > > > > > > > > > [4] > > > > > > > > > and Spark [5] communities are discussing moving the variant > > > type > > > > to > > > > > > > > Parquet > > > > > > > > > repo to avoid divergence. Moving it into Parquet makes the > > > > variant > > > > > > spec > > > > > > > > > engine > > > > > > > > > and table format agnostic, which may encourage wider > > adoption. > > > > > > > > > > > > > > > > > > What do people from Parquet community think? > > > > > > > > > > > > > > > > > > [1] > > > > > https://lists.apache.org/thread/xnyo1k66dxh0ffpg7j9f04xgos0kwc34 > > > > > > > > > [2] > > > > > https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/spark/blob/d84f1a3575c4125009374521d2f179 > > > 089ebd71ad/common/variant/README.md > > > > > > > > > [4] > > > > > https://lists.apache.org/thread/hopkr2f0ftoywwt9zo3jxb7n0ob5s5bw > > > > > > > > > [5] > > > > > https://lists.apache.org/thread/0k5oj3mn0049fcxoxm3gx3d7r28gw4rj > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Gang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Ryan Blue > > > Databricks > > > > > >