+1 on moving this to the Parquet project/community (assuming that the Parquet community is ok with this)
On Thu, Aug 22, 2024 at 3:02 AM Chao Sun <sunc...@apache.org> wrote: > +1 too > > On Wed, Aug 21, 2024 at 4:43 PM huaxin gao <huaxin.ga...@gmail.com> wrote: > >> +1 for moving variant type to Parquet, as it promotes standardization and >> interoperability across numerous projects. >> >> Huaxin >> >> On Wed, Aug 21, 2024 at 1:28 PM Yufei Gu <flyrain...@gmail.com> wrote: >> >>> Agreed that Parquet would be a good place to host the new type. >>> Different table formats, like Iceberg and Delta can benefit from it as they >>> have based on parquet already. >>> >>> Yufei >>> >>> >>> On Wed, Aug 21, 2024 at 12:15 AM Alkis Evlogimenos >>> <alkis.evlogime...@databricks.com.invalid> wrote: >>> >>>> +1 >>>> >>>> In addition to everything said above, it is also a great opportunity >>>> for wider testing and possibly tweaking the spec before it takes off post >>>> standardization. >>>> >>>> On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer < >>>> russell.spit...@gmail.com> wrote: >>>> >>>>> I think this would be a great move to encourage all sorts of engines >>>>> and table formats to take advantage of variant type and make sure it >>>>> remains compatible between all those systems. >>>>> >>>>> I strongly support this, >>>>> Russ >>>>> >>>>> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <fo...@apache.org> >>>>> wrote: >>>>> >>>>>> Hey everyone, >>>>>> >>>>>> I agree the Parquet project is a good place to host and evolve the >>>>>> spec (we could store it in parquet-variant?). We would need to align this >>>>>> with the Parquet project. Anyway, I'm familiar both with Iceberg and >>>>>> Parquet and happy to help where needed. >>>>>> >>>>>> Kind regards, >>>>>> Fokko >>>>>> >>>>>> >>>>>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin >>>>>> <r...@databricks.com.invalid>: >>>>>> >>>>>>> As I said on dev@iceberg, it'd be really unfortunate if we end up >>>>>>> with two or even more diverging specs for storing variants. It just adds >>>>>>> more work for everybody to interop. Parquet would be a great home for >>>>>>> this >>>>>>> spec as a neutral project that almost all the other important projects >>>>>>> in >>>>>>> this space depend on as the de facto standard for physical data encoding >>>>>>> and storage. So if we can collaborate with the Parquet community and get >>>>>>> this into Parquet to avoid each project building its own spec, that'd be >>>>>>> amazing. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I am one of the main developers implementing Variant in Spark. The >>>>>>>> specification and all the code are currently merged into the >>>>>>>> common/variant >>>>>>>> <https://github.com/apache/spark/tree/master/common/variant> >>>>>>>> package in the Spark repo. >>>>>>>> >>>>>>>> There has been growing interest from other projects (such as >>>>>>>> Iceberg) in supporting Variant, and we think that moving the Variant >>>>>>>> spec >>>>>>>> and implementation out to a new home might be the best way for all the >>>>>>>> different projects to be able to use and collaborate on Variant. We >>>>>>>> originally put all the Variant code under common/variant with the >>>>>>>> expectation that eventually it would be moved elsewhere. >>>>>>>> >>>>>>>> We are proposing that we move the Variant spec and implementation >>>>>>>> out of the Spark project, to the Parquet project. Spark depends >>>>>>>> heavily on >>>>>>>> Parquet, and the Variant spec contains a lot of details on the physical >>>>>>>> storage layer, such as shredding. The Parquet project would be a great >>>>>>>> place to standardize the Variant data type, and to enable >>>>>>>> interoperability >>>>>>>> across many different projects. However, even when we move Variant >>>>>>>> out, we >>>>>>>> expect to retain the compatibility with the current Spark >>>>>>>> implementation. >>>>>>>> >>>>>>>> What do people think? There are probably many details we still need >>>>>>>> to figure out in terms of moving the implementation, but at a >>>>>>>> high-level, >>>>>>>> does it make sense to move Variant to Parquet? >>>>>>>> >>>>>>>> I appreciate your feedback! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Gene >>>>>>>> >>>>>>>