A separate repo for variant type makes sense to me. And I don't think
we need to have two reference implementations ready before the
adoption because it is already a released spec.

> Is the intent to release it independently of the Parquet-format spec?
> I see the Variant type also has a version.

IIUC, the version field in the variant spec advises how variant data is
encoded. If this is the case, we should bump parquet-format version
when a new encoding scheme is introduced.

Best,
Gang





On Sat, Aug 24, 2024 at 8:43 AM Julien Le Dem <[email protected]> wrote:

> (Note: I am also catching up on the threads linked in the email)
>
> On Fri, Aug 23, 2024 at 5:38 PM Julien Le Dem <[email protected]> wrote:
>
> > I am in favor of making this a separate artifact that other projects can
> > depend on without pulling extra dependencies they might not want.
> > What do others think about a separate repo?
> > Is the intent to release it independently of the Parquet-format spec? I
> > see the Variant type also has a version.
> > Julien
> >
> > On Fri, Aug 23, 2024 at 4:31 PM Daniel Weeks <[email protected]> wrote:
> >
> >> Julien,
> >>
> >> I think there's interest in supporting multiple language implementations
> >> for variant (java/scala/cpp/rust/etc), so we might what to consider
> having
> >> a 'parquet-varient' repository to house the spec and language
> >> implementations.  That might also help to keep them aligned, but open to
> >> other suggestions.
> >>
> >> -Dan
> >>
> >> On Fri, Aug 23, 2024 at 3:07 PM Julien Le Dem <[email protected]>
> wrote:
> >>
> >> > Hello,
> >> > I think it is great that we are converging on a Variant type.
> >> > For the parquet-java implementation, it looks like it could be as easy
> >> as
> >> > importing the spark implementation [1]?
> >> > I'm not sure this is actually blocking anything as I'm assuming this
> >> gets
> >> > stored in a binary type today.
> >> > Is there an existing Cpp implementation?
> >> > Are there other existing types defined somewhere else solving that
> same
> >> > need that we should be paying attention to? (or should become
> compatible
> >> > with this)
> >> > Best
> >> > Julien
> >> > [1]
> >> >
> >> >
> >>
> https://github.com/apache/spark/tree/master/common/variant/src/main/java/org/apache/spark/types/variant
> >> >
> >> >
> >> > On Fri, Aug 23, 2024 at 2:17 PM Jacques Nadeau <[email protected]>
> >> wrote:
> >> >
> >> > > > Do we have volunteers to implement it in Parquet-java + another
> OSS
> >> > > implementation?
> >> > >
> >> > > I don't think that should be a blocker for incorporating. I'd be
> >> inclined
> >> > > to do something like mark it as experimental or similar in the spec
> >> until
> >> > > the reference impls are done.
> >> > >
> >> > > On Fri, Aug 23, 2024 at 10:32 AM Micah Kornfield <
> >> [email protected]>
> >> > > wrote:
> >> > >
> >> > > > I'm in favor of this, but wondering on the logistics.  Do we have
> >> > > > volunteers to implement it in Parquet-java + another OSS
> >> implementation
> >> > > or
> >> > > > are we going to bypass this requirement for now?
> >> > > >
> >> > > > Thanks,
> >> > > > Micah
> >> > > >
> >> > > > On Friday, August 23, 2024, Ryan Blue <[email protected]
> >
> >> > > wrote:
> >> > > >
> >> > > > > +1
> >> > > > >
> >> > > > > On Fri, Aug 23, 2024 at 12:30 PM Jacques Nadeau <
> >> [email protected]>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > +1
> >> > > > > >
> >> > > > > > On Fri, Aug 23, 2024 at 8:51 AM Nong Li <[email protected]>
> >> wrote:
> >> > > > > >
> >> > > > > > > +1.
> >> > > > > > >
> >> > > > > > > On Fri, Aug 23, 2024 at 12:57 PM Jan Finis <
> [email protected]
> >> >
> >> > > > wrote:
> >> > > > > > >
> >> > > > > > > > I would also appreciate having native Variant support in
> >> > Parquet.
> >> > > > > > > >
> >> > > > > > > > Am Fr., 23. Aug. 2024 um 12:10 Uhr schrieb Fokko
> Driesprong
> >> <
> >> > > > > > > > [email protected]>:
> >> > > > > > > >
> >> > > > > > > > > Hey Gang,
> >> > > > > > > > >
> >> > > > > > > > > Thanks for raising this. +1 from my end.
> >> > > > > > > > >
> >> > > > > > > > > For context, as Gang mentioned, when proposing to add a
> >> > Variant
> >> > > > > Type
> >> > > > > > to
> >> > > > > > > > > Iceberg <https://github.com/apache/iceberg/issues/10392
> >,
> >> > one
> >> > > of
> >> > > > > the
> >> > > > > > > > > future
> >> > > > > > > > > goals was to integrate more closely with Parquet, and
> >> having
> >> > > the
> >> > > > > spec
> >> > > > > > > at
> >> > > > > > > > > Parquet will help to speed this up.
> >> > > > > > > > >
> >> > > > > > > > > Kind regards,
> >> > > > > > > > > Fokko
> >> > > > > > > > >
> >> > > > > > > > > Op vr 23 aug 2024 om 11:37 schreef Gábor Szádovszky <
> >> > > > > > [email protected]
> >> > > > > > > >:
> >> > > > > > > > >
> >> > > > > > > > > > Hi Gang,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for bringing this up.
> >> > > > > > > > > >
> >> > > > > > > > > > I think that if Variant type would have come up
> earlier
> >> > > (before
> >> > > > > > > > > > iceberg/arrow), its natural place would have been at
> the
> >> > file
> >> > > > > > format
> >> > > > > > > > > level
> >> > > > > > > > > > as any other types. The communities started discussing
> >> > where
> >> > > it
> >> > > > > > > should
> >> > > > > > > > be
> >> > > > > > > > > > placed because now we have different type systems at
> >> > > different
> >> > > > > > > places.
> >> > > > > > > > > > Also, the current spec of Variant makes it more or
> less
> >> > > > > independent
> >> > > > > > > > from
> >> > > > > > > > > > the Parquet file format.
> >> > > > > > > > > > However, even at Parquet level, we would need at least
> >> an
> >> > > > > > additional
> >> > > > > > > > > > Logical type to help handle Variant type by the
> systems
> >> > > > > > > reading/writing
> >> > > > > > > > > > Parquet.
> >> > > > > > > > > >
> >> > > > > > > > > > To summarize my opinion, +1 for having the whole
> Variant
> >> > spec
> >> > > > in
> >> > > > > > > > Parquet
> >> > > > > > > > > > format.
> >> > > > > > > > > >
> >> > > > > > > > > > Cheers,
> >> > > > > > > > > > Gabor
> >> > > > > > > > > >
> >> > > > > > > > > > Gang Wu <[email protected]> ezt írta (időpont: 2024.
> >> aug.
> >> > > 23.,
> >> > > > P,
> >> > > > > > > > 11:18):
> >> > > > > > > > > >
> >> > > > > > > > > > > Hi,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Apache Iceberg is adding variant type support [1][2]
> >> by
> >> > > > > adopting
> >> > > > > > > the
> >> > > > > > > > > > > variant
> >> > > > > > > > > > > spec [3] from Apache Spark. As the proposal is
> getting
> >> > > > mature,
> >> > > > > > both
> >> > > > > > > > > > Iceberg
> >> > > > > > > > > > > [4]
> >> > > > > > > > > > > and Spark [5] communities are discussing moving the
> >> > variant
> >> > > > > type
> >> > > > > > to
> >> > > > > > > > > > Parquet
> >> > > > > > > > > > > repo to avoid divergence. Moving it into Parquet
> makes
> >> > the
> >> > > > > > variant
> >> > > > > > > > spec
> >> > > > > > > > > > > engine
> >> > > > > > > > > > > and table format agnostic, which may encourage wider
> >> > > > adoption.
> >> > > > > > > > > > >
> >> > > > > > > > > > > What do people from Parquet community think?
> >> > > > > > > > > > >
> >> > > > > > > > > > > [1]
> >> > > > > > >
> >> https://lists.apache.org/thread/xnyo1k66dxh0ffpg7j9f04xgos0kwc34
> >> > > > > > > > > > > [2]
> >> > > > > > >
> >> https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq
> >> > > > > > > > > > > [3]
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > https://github.com/apache/spark/blob/d84f1a3575c4125009374521d2f179
> >> > > > > 089ebd71ad/common/variant/README.md
> >> > > > > > > > > > > [4]
> >> > > > > > >
> >> https://lists.apache.org/thread/hopkr2f0ftoywwt9zo3jxb7n0ob5s5bw
> >> > > > > > > > > > > [5]
> >> > > > > > >
> >> https://lists.apache.org/thread/0k5oj3mn0049fcxoxm3gx3d7r28gw4rj
> >> > > > > > > > > > >
> >> > > > > > > > > > > Best,
> >> > > > > > > > > > > Gang
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Ryan Blue
> >> > > > > Databricks
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to