(Note: I am also catching up on the threads linked in the email) On Fri, Aug 23, 2024 at 5:38 PM Julien Le Dem <jul...@apache.org> wrote:
> I am in favor of making this a separate artifact that other projects can > depend on without pulling extra dependencies they might not want. > What do others think about a separate repo? > Is the intent to release it independently of the Parquet-format spec? I > see the Variant type also has a version. > Julien > > On Fri, Aug 23, 2024 at 4:31 PM Daniel Weeks <dwe...@apache.org> wrote: > >> Julien, >> >> I think there's interest in supporting multiple language implementations >> for variant (java/scala/cpp/rust/etc), so we might what to consider having >> a 'parquet-varient' repository to house the spec and language >> implementations. That might also help to keep them aligned, but open to >> other suggestions. >> >> -Dan >> >> On Fri, Aug 23, 2024 at 3:07 PM Julien Le Dem <jul...@apache.org> wrote: >> >> > Hello, >> > I think it is great that we are converging on a Variant type. >> > For the parquet-java implementation, it looks like it could be as easy >> as >> > importing the spark implementation [1]? >> > I'm not sure this is actually blocking anything as I'm assuming this >> gets >> > stored in a binary type today. >> > Is there an existing Cpp implementation? >> > Are there other existing types defined somewhere else solving that same >> > need that we should be paying attention to? (or should become compatible >> > with this) >> > Best >> > Julien >> > [1] >> > >> > >> https://github.com/apache/spark/tree/master/common/variant/src/main/java/org/apache/spark/types/variant >> > >> > >> > On Fri, Aug 23, 2024 at 2:17 PM Jacques Nadeau <jacq...@apache.org> >> wrote: >> > >> > > > Do we have volunteers to implement it in Parquet-java + another OSS >> > > implementation? >> > > >> > > I don't think that should be a blocker for incorporating. I'd be >> inclined >> > > to do something like mark it as experimental or similar in the spec >> until >> > > the reference impls are done. >> > > >> > > On Fri, Aug 23, 2024 at 10:32 AM Micah Kornfield < >> emkornfi...@gmail.com> >> > > wrote: >> > > >> > > > I'm in favor of this, but wondering on the logistics. Do we have >> > > > volunteers to implement it in Parquet-java + another OSS >> implementation >> > > or >> > > > are we going to bypass this requirement for now? >> > > > >> > > > Thanks, >> > > > Micah >> > > > >> > > > On Friday, August 23, 2024, Ryan Blue <b...@databricks.com.invalid> >> > > wrote: >> > > > >> > > > > +1 >> > > > > >> > > > > On Fri, Aug 23, 2024 at 12:30 PM Jacques Nadeau < >> jacq...@apache.org> >> > > > > wrote: >> > > > > >> > > > > > +1 >> > > > > > >> > > > > > On Fri, Aug 23, 2024 at 8:51 AM Nong Li <non...@gmail.com> >> wrote: >> > > > > > >> > > > > > > +1. >> > > > > > > >> > > > > > > On Fri, Aug 23, 2024 at 12:57 PM Jan Finis <jpfi...@gmail.com >> > >> > > > wrote: >> > > > > > > >> > > > > > > > I would also appreciate having native Variant support in >> > Parquet. >> > > > > > > > >> > > > > > > > Am Fr., 23. Aug. 2024 um 12:10 Uhr schrieb Fokko Driesprong >> < >> > > > > > > > fo...@apache.org>: >> > > > > > > > >> > > > > > > > > Hey Gang, >> > > > > > > > > >> > > > > > > > > Thanks for raising this. +1 from my end. >> > > > > > > > > >> > > > > > > > > For context, as Gang mentioned, when proposing to add a >> > Variant >> > > > > Type >> > > > > > to >> > > > > > > > > Iceberg <https://github.com/apache/iceberg/issues/10392>, >> > one >> > > of >> > > > > the >> > > > > > > > > future >> > > > > > > > > goals was to integrate more closely with Parquet, and >> having >> > > the >> > > > > spec >> > > > > > > at >> > > > > > > > > Parquet will help to speed this up. >> > > > > > > > > >> > > > > > > > > Kind regards, >> > > > > > > > > Fokko >> > > > > > > > > >> > > > > > > > > Op vr 23 aug 2024 om 11:37 schreef Gábor Szádovszky < >> > > > > > ga...@apache.org >> > > > > > > >: >> > > > > > > > > >> > > > > > > > > > Hi Gang, >> > > > > > > > > > >> > > > > > > > > > Thanks for bringing this up. >> > > > > > > > > > >> > > > > > > > > > I think that if Variant type would have come up earlier >> > > (before >> > > > > > > > > > iceberg/arrow), its natural place would have been at the >> > file >> > > > > > format >> > > > > > > > > level >> > > > > > > > > > as any other types. The communities started discussing >> > where >> > > it >> > > > > > > should >> > > > > > > > be >> > > > > > > > > > placed because now we have different type systems at >> > > different >> > > > > > > places. >> > > > > > > > > > Also, the current spec of Variant makes it more or less >> > > > > independent >> > > > > > > > from >> > > > > > > > > > the Parquet file format. >> > > > > > > > > > However, even at Parquet level, we would need at least >> an >> > > > > > additional >> > > > > > > > > > Logical type to help handle Variant type by the systems >> > > > > > > reading/writing >> > > > > > > > > > Parquet. >> > > > > > > > > > >> > > > > > > > > > To summarize my opinion, +1 for having the whole Variant >> > spec >> > > > in >> > > > > > > > Parquet >> > > > > > > > > > format. >> > > > > > > > > > >> > > > > > > > > > Cheers, >> > > > > > > > > > Gabor >> > > > > > > > > > >> > > > > > > > > > Gang Wu <ust...@gmail.com> ezt írta (időpont: 2024. >> aug. >> > > 23., >> > > > P, >> > > > > > > > 11:18): >> > > > > > > > > > >> > > > > > > > > > > Hi, >> > > > > > > > > > > >> > > > > > > > > > > Apache Iceberg is adding variant type support [1][2] >> by >> > > > > adopting >> > > > > > > the >> > > > > > > > > > > variant >> > > > > > > > > > > spec [3] from Apache Spark. As the proposal is getting >> > > > mature, >> > > > > > both >> > > > > > > > > > Iceberg >> > > > > > > > > > > [4] >> > > > > > > > > > > and Spark [5] communities are discussing moving the >> > variant >> > > > > type >> > > > > > to >> > > > > > > > > > Parquet >> > > > > > > > > > > repo to avoid divergence. Moving it into Parquet makes >> > the >> > > > > > variant >> > > > > > > > spec >> > > > > > > > > > > engine >> > > > > > > > > > > and table format agnostic, which may encourage wider >> > > > adoption. >> > > > > > > > > > > >> > > > > > > > > > > What do people from Parquet community think? >> > > > > > > > > > > >> > > > > > > > > > > [1] >> > > > > > > >> https://lists.apache.org/thread/xnyo1k66dxh0ffpg7j9f04xgos0kwc34 >> > > > > > > > > > > [2] >> > > > > > > >> https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq >> > > > > > > > > > > [3] >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > https://github.com/apache/spark/blob/d84f1a3575c4125009374521d2f179 >> > > > > 089ebd71ad/common/variant/README.md >> > > > > > > > > > > [4] >> > > > > > > >> https://lists.apache.org/thread/hopkr2f0ftoywwt9zo3jxb7n0ob5s5bw >> > > > > > > > > > > [5] >> > > > > > > >> https://lists.apache.org/thread/0k5oj3mn0049fcxoxm3gx3d7r28gw4rj >> > > > > > > > > > > >> > > > > > > > > > > Best, >> > > > > > > > > > > Gang >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Ryan Blue >> > > > > Databricks >> > > > > >> > > > >> > > >> > >> >