Re: [DISCUSS] Move Variant to Parquet?

Russell Spitzer Tue, 20 Aug 2024 07:36:23 -0700

I think this would be a great move to encourage all sorts of engines and
table formats to take advantage of variant type and make sure it remains
compatible between all those systems.


I strongly support this,
Russ

On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <[email protected]> wrote:

> Hey everyone,
>
> I agree the Parquet project is a good place to host and evolve the spec
> (we could store it in parquet-variant?). We would need to align this with
> the Parquet project. Anyway, I'm familiar both with Iceberg and Parquet and
> happy to help where needed.
>
> Kind regards,
> Fokko
>
>
> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin <[email protected]
> >:
>
>> As I said on dev@iceberg, it'd be really unfortunate if we end up with
>> two or even more diverging specs for storing variants. It just adds more
>> work for everybody to interop. Parquet would be a great home for this spec
>> as a neutral project that almost all the other important projects in this
>> space depend on as the de facto standard for physical data encoding and
>> storage. So if we can collaborate with the Parquet community and get this
>> into Parquet to avoid each project building its own spec, that'd be amazing.
>>
>>
>>
>>
>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I am one of the main developers implementing Variant in Spark. The
>>> specification and all the code are currently merged into the
>>> common/variant
>>> <https://github.com/apache/spark/tree/master/common/variant> package in
>>> the Spark repo.
>>>
>>> There has been growing interest from other projects (such as Iceberg) in
>>> supporting Variant, and we think that moving the Variant spec and
>>> implementation out to a new home might be the best way for all the
>>> different projects to be able to use and collaborate on Variant. We
>>> originally put all the Variant code under common/variant with the
>>> expectation that eventually it would be moved elsewhere.
>>>
>>> We are proposing that we move the Variant spec and implementation out of
>>> the Spark project, to the Parquet project. Spark depends heavily on
>>> Parquet, and the Variant spec contains a lot of details on the physical
>>> storage layer, such as shredding. The Parquet project would be a great
>>> place to standardize the Variant data type, and to enable interoperability
>>> across many different projects. However, even when we move Variant out, we
>>> expect to retain the compatibility with the current Spark implementation.
>>>
>>> What do people think? There are probably many details we still need to
>>> figure out in terms of moving the implementation, but at a high-level, does
>>> it make sense to move Variant to Parquet?
>>>
>>> I appreciate your feedback!
>>>
>>> Thanks,
>>> Gene
>>>
>>

Re: [DISCUSS] Move Variant to Parquet?

Reply via email to