(Note: I am also catching up on the threads linked in the email)

On Fri, Aug 23, 2024 at 5:38 PM Julien Le Dem <jul...@apache.org> wrote:

> I am in favor of making this a separate artifact that other projects can
> depend on without pulling extra dependencies they might not want.
> What do others think about a separate repo?
> Is the intent to release it independently of the Parquet-format spec? I
> see the Variant type also has a version.
> Julien
>
> On Fri, Aug 23, 2024 at 4:31 PM Daniel Weeks <dwe...@apache.org> wrote:
>
>> Julien,
>>
>> I think there's interest in supporting multiple language implementations
>> for variant (java/scala/cpp/rust/etc), so we might what to consider having
>> a 'parquet-varient' repository to house the spec and language
>> implementations.  That might also help to keep them aligned, but open to
>> other suggestions.
>>
>> -Dan
>>
>> On Fri, Aug 23, 2024 at 3:07 PM Julien Le Dem <jul...@apache.org> wrote:
>>
>> > Hello,
>> > I think it is great that we are converging on a Variant type.
>> > For the parquet-java implementation, it looks like it could be as easy
>> as
>> > importing the spark implementation [1]?
>> > I'm not sure this is actually blocking anything as I'm assuming this
>> gets
>> > stored in a binary type today.
>> > Is there an existing Cpp implementation?
>> > Are there other existing types defined somewhere else solving that same
>> > need that we should be paying attention to? (or should become compatible
>> > with this)
>> > Best
>> > Julien
>> > [1]
>> >
>> >
>> https://github.com/apache/spark/tree/master/common/variant/src/main/java/org/apache/spark/types/variant
>> >
>> >
>> > On Fri, Aug 23, 2024 at 2:17 PM Jacques Nadeau <jacq...@apache.org>
>> wrote:
>> >
>> > > > Do we have volunteers to implement it in Parquet-java + another OSS
>> > > implementation?
>> > >
>> > > I don't think that should be a blocker for incorporating. I'd be
>> inclined
>> > > to do something like mark it as experimental or similar in the spec
>> until
>> > > the reference impls are done.
>> > >
>> > > On Fri, Aug 23, 2024 at 10:32 AM Micah Kornfield <
>> emkornfi...@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm in favor of this, but wondering on the logistics.  Do we have
>> > > > volunteers to implement it in Parquet-java + another OSS
>> implementation
>> > > or
>> > > > are we going to bypass this requirement for now?
>> > > >
>> > > > Thanks,
>> > > > Micah
>> > > >
>> > > > On Friday, August 23, 2024, Ryan Blue <b...@databricks.com.invalid>
>> > > wrote:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > On Fri, Aug 23, 2024 at 12:30 PM Jacques Nadeau <
>> jacq...@apache.org>
>> > > > > wrote:
>> > > > >
>> > > > > > +1
>> > > > > >
>> > > > > > On Fri, Aug 23, 2024 at 8:51 AM Nong Li <non...@gmail.com>
>> wrote:
>> > > > > >
>> > > > > > > +1.
>> > > > > > >
>> > > > > > > On Fri, Aug 23, 2024 at 12:57 PM Jan Finis <jpfi...@gmail.com
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > I would also appreciate having native Variant support in
>> > Parquet.
>> > > > > > > >
>> > > > > > > > Am Fr., 23. Aug. 2024 um 12:10 Uhr schrieb Fokko Driesprong
>> <
>> > > > > > > > fo...@apache.org>:
>> > > > > > > >
>> > > > > > > > > Hey Gang,
>> > > > > > > > >
>> > > > > > > > > Thanks for raising this. +1 from my end.
>> > > > > > > > >
>> > > > > > > > > For context, as Gang mentioned, when proposing to add a
>> > Variant
>> > > > > Type
>> > > > > > to
>> > > > > > > > > Iceberg <https://github.com/apache/iceberg/issues/10392>,
>> > one
>> > > of
>> > > > > the
>> > > > > > > > > future
>> > > > > > > > > goals was to integrate more closely with Parquet, and
>> having
>> > > the
>> > > > > spec
>> > > > > > > at
>> > > > > > > > > Parquet will help to speed this up.
>> > > > > > > > >
>> > > > > > > > > Kind regards,
>> > > > > > > > > Fokko
>> > > > > > > > >
>> > > > > > > > > Op vr 23 aug 2024 om 11:37 schreef Gábor Szádovszky <
>> > > > > > ga...@apache.org
>> > > > > > > >:
>> > > > > > > > >
>> > > > > > > > > > Hi Gang,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for bringing this up.
>> > > > > > > > > >
>> > > > > > > > > > I think that if Variant type would have come up earlier
>> > > (before
>> > > > > > > > > > iceberg/arrow), its natural place would have been at the
>> > file
>> > > > > > format
>> > > > > > > > > level
>> > > > > > > > > > as any other types. The communities started discussing
>> > where
>> > > it
>> > > > > > > should
>> > > > > > > > be
>> > > > > > > > > > placed because now we have different type systems at
>> > > different
>> > > > > > > places.
>> > > > > > > > > > Also, the current spec of Variant makes it more or less
>> > > > > independent
>> > > > > > > > from
>> > > > > > > > > > the Parquet file format.
>> > > > > > > > > > However, even at Parquet level, we would need at least
>> an
>> > > > > > additional
>> > > > > > > > > > Logical type to help handle Variant type by the systems
>> > > > > > > reading/writing
>> > > > > > > > > > Parquet.
>> > > > > > > > > >
>> > > > > > > > > > To summarize my opinion, +1 for having the whole Variant
>> > spec
>> > > > in
>> > > > > > > > Parquet
>> > > > > > > > > > format.
>> > > > > > > > > >
>> > > > > > > > > > Cheers,
>> > > > > > > > > > Gabor
>> > > > > > > > > >
>> > > > > > > > > > Gang Wu <ust...@gmail.com> ezt írta (időpont: 2024.
>> aug.
>> > > 23.,
>> > > > P,
>> > > > > > > > 11:18):
>> > > > > > > > > >
>> > > > > > > > > > > Hi,
>> > > > > > > > > > >
>> > > > > > > > > > > Apache Iceberg is adding variant type support [1][2]
>> by
>> > > > > adopting
>> > > > > > > the
>> > > > > > > > > > > variant
>> > > > > > > > > > > spec [3] from Apache Spark. As the proposal is getting
>> > > > mature,
>> > > > > > both
>> > > > > > > > > > Iceberg
>> > > > > > > > > > > [4]
>> > > > > > > > > > > and Spark [5] communities are discussing moving the
>> > variant
>> > > > > type
>> > > > > > to
>> > > > > > > > > > Parquet
>> > > > > > > > > > > repo to avoid divergence. Moving it into Parquet makes
>> > the
>> > > > > > variant
>> > > > > > > > spec
>> > > > > > > > > > > engine
>> > > > > > > > > > > and table format agnostic, which may encourage wider
>> > > > adoption.
>> > > > > > > > > > >
>> > > > > > > > > > > What do people from Parquet community think?
>> > > > > > > > > > >
>> > > > > > > > > > > [1]
>> > > > > > >
>> https://lists.apache.org/thread/xnyo1k66dxh0ffpg7j9f04xgos0kwc34
>> > > > > > > > > > > [2]
>> > > > > > >
>> https://lists.apache.org/thread/xcyytoypgplfr74klg1z2rgjo6k5b0sq
>> > > > > > > > > > > [3]
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > https://github.com/apache/spark/blob/d84f1a3575c4125009374521d2f179
>> > > > > 089ebd71ad/common/variant/README.md
>> > > > > > > > > > > [4]
>> > > > > > >
>> https://lists.apache.org/thread/hopkr2f0ftoywwt9zo3jxb7n0ob5s5bw
>> > > > > > > > > > > [5]
>> > > > > > >
>> https://lists.apache.org/thread/0k5oj3mn0049fcxoxm3gx3d7r28gw4rj
>> > > > > > > > > > >
>> > > > > > > > > > > Best,
>> > > > > > > > > > > Gang
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Ryan Blue
>> > > > > Databricks
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to