+1, there is a common problem of moving something out, "synchronization of release cadence" but I believe we can make the migration smooth.
On Thu, Aug 22, 2024 at 6:54 AM Yufei Gu <flyrain...@gmail.com> wrote: > Agreed that Parquet would be a good place to host the new type. Different > table formats, like Iceberg and Delta can benefit from it as they have > based on parquet already. > > Yufei > > > On Wed, Aug 21, 2024 at 12:15 AM Alkis Evlogimenos > <alkis.evlogime...@databricks.com.invalid> wrote: > >> +1 >> >> In addition to everything said above, it is also a great opportunity for >> wider testing and possibly tweaking the spec before it takes off post >> standardization. >> >> On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> I think this would be a great move to encourage all sorts of engines and >>> table formats to take advantage of variant type and make sure it remains >>> compatible between all those systems. >>> >>> I strongly support this, >>> Russ >>> >>> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <fo...@apache.org> >>> wrote: >>> >>>> Hey everyone, >>>> >>>> I agree the Parquet project is a good place to host and evolve the spec >>>> (we could store it in parquet-variant?). We would need to align this with >>>> the Parquet project. Anyway, I'm familiar both with Iceberg and Parquet and >>>> happy to help where needed. >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> >>>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin >>>> <r...@databricks.com.invalid>: >>>> >>>>> As I said on dev@iceberg, it'd be really unfortunate if we end up >>>>> with two or even more diverging specs for storing variants. It just adds >>>>> more work for everybody to interop. Parquet would be a great home for this >>>>> spec as a neutral project that almost all the other important projects in >>>>> this space depend on as the de facto standard for physical data encoding >>>>> and storage. So if we can collaborate with the Parquet community and get >>>>> this into Parquet to avoid each project building its own spec, that'd be >>>>> amazing. >>>>> >>>>> >>>>> >>>>> >>>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I am one of the main developers implementing Variant in Spark. The >>>>>> specification and all the code are currently merged into the >>>>>> common/variant >>>>>> <https://github.com/apache/spark/tree/master/common/variant> package >>>>>> in the Spark repo. >>>>>> >>>>>> There has been growing interest from other projects (such as Iceberg) >>>>>> in supporting Variant, and we think that moving the Variant spec and >>>>>> implementation out to a new home might be the best way for all the >>>>>> different projects to be able to use and collaborate on Variant. We >>>>>> originally put all the Variant code under common/variant with the >>>>>> expectation that eventually it would be moved elsewhere. >>>>>> >>>>>> We are proposing that we move the Variant spec and implementation out >>>>>> of the Spark project, to the Parquet project. Spark depends heavily on >>>>>> Parquet, and the Variant spec contains a lot of details on the physical >>>>>> storage layer, such as shredding. The Parquet project would be a great >>>>>> place to standardize the Variant data type, and to enable >>>>>> interoperability >>>>>> across many different projects. However, even when we move Variant out, >>>>>> we >>>>>> expect to retain the compatibility with the current Spark implementation. >>>>>> >>>>>> What do people think? There are probably many details we still need >>>>>> to figure out in terms of moving the implementation, but at a high-level, >>>>>> does it make sense to move Variant to Parquet? >>>>>> >>>>>> I appreciate your feedback! >>>>>> >>>>>> Thanks, >>>>>> Gene >>>>>> >>>>>