Re: [DISCUSS] Move Variant to Parquet?

Jungtaek Lim Wed, 21 Aug 2024 16:43:17 -0700

+1, there is a common problem of moving something out, "synchronization of
release cadence" but I believe we can make the migration smooth.


On Thu, Aug 22, 2024 at 6:54 AM Yufei Gu <[email protected]> wrote:

> Agreed that Parquet would be a good place to host the new type. Different
> table formats, like Iceberg and Delta can benefit from it as they have
> based on parquet already.
>
> Yufei
>
>
> On Wed, Aug 21, 2024 at 12:15 AM Alkis Evlogimenos
> <[email protected]> wrote:
>
>> +1
>>
>> In addition to everything said above, it is also a great opportunity for
>> wider testing and possibly tweaking the spec before it takes off post
>> standardization.
>>
>> On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer <
>> [email protected]> wrote:
>>
>>> I think this would be a great move to encourage all sorts of engines and
>>> table formats to take advantage of variant type and make sure it remains
>>> compatible between all those systems.
>>>
>>> I strongly support this,
>>> Russ
>>>
>>> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <[email protected]>
>>> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> I agree the Parquet project is a good place to host and evolve the spec
>>>> (we could store it in parquet-variant?). We would need to align this with
>>>> the Parquet project. Anyway, I'm familiar both with Iceberg and Parquet and
>>>> happy to help where needed.
>>>>
>>>> Kind regards,
>>>> Fokko
>>>>
>>>>
>>>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin
>>>> <[email protected]>:
>>>>
>>>>> As I said on dev@iceberg, it'd be really unfortunate if we end up
>>>>> with two or even more diverging specs for storing variants. It just adds
>>>>> more work for everybody to interop. Parquet would be a great home for this
>>>>> spec as a neutral project that almost all the other important projects in
>>>>> this space depend on as the de facto standard for physical data encoding
>>>>> and storage. So if we can collaborate with the Parquet community and get
>>>>> this into Parquet to avoid each project building its own spec, that'd be
>>>>> amazing.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <[email protected]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am one of the main developers implementing Variant in Spark. The
>>>>>> specification and all the code are currently merged into the
>>>>>> common/variant
>>>>>> <https://github.com/apache/spark/tree/master/common/variant> package
>>>>>> in the Spark repo.
>>>>>>
>>>>>> There has been growing interest from other projects (such as Iceberg)
>>>>>> in supporting Variant, and we think that moving the Variant spec and
>>>>>> implementation out to a new home might be the best way for all the
>>>>>> different projects to be able to use and collaborate on Variant. We
>>>>>> originally put all the Variant code under common/variant with the
>>>>>> expectation that eventually it would be moved elsewhere.
>>>>>>
>>>>>> We are proposing that we move the Variant spec and implementation out
>>>>>> of the Spark project, to the Parquet project. Spark depends heavily on
>>>>>> Parquet, and the Variant spec contains a lot of details on the physical
>>>>>> storage layer, such as shredding. The Parquet project would be a great
>>>>>> place to standardize the Variant data type, and to enable 
>>>>>> interoperability
>>>>>> across many different projects. However, even when we move Variant out, 
>>>>>> we
>>>>>> expect to retain the compatibility with the current Spark implementation.
>>>>>>
>>>>>> What do people think? There are probably many details we still need
>>>>>> to figure out in terms of moving the implementation, but at a high-level,
>>>>>> does it make sense to move Variant to Parquet?
>>>>>>
>>>>>> I appreciate your feedback!
>>>>>>
>>>>>> Thanks,
>>>>>> Gene
>>>>>>
>>>>>

Re: [DISCUSS] Move Variant to Parquet?

Reply via email to