Agreed that Parquet would be a good place to host the new type. Different table formats, like Iceberg and Delta can benefit from it as they have based on parquet already.
Yufei On Wed, Aug 21, 2024 at 12:15 AM Alkis Evlogimenos <alkis.evlogime...@databricks.com.invalid> wrote: > +1 > > In addition to everything said above, it is also a great opportunity for > wider testing and possibly tweaking the spec before it takes off post > standardization. > > On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> I think this would be a great move to encourage all sorts of engines and >> table formats to take advantage of variant type and make sure it remains >> compatible between all those systems. >> >> I strongly support this, >> Russ >> >> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <fo...@apache.org> >> wrote: >> >>> Hey everyone, >>> >>> I agree the Parquet project is a good place to host and evolve the spec >>> (we could store it in parquet-variant?). We would need to align this with >>> the Parquet project. Anyway, I'm familiar both with Iceberg and Parquet and >>> happy to help where needed. >>> >>> Kind regards, >>> Fokko >>> >>> >>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin >>> <r...@databricks.com.invalid>: >>> >>>> As I said on dev@iceberg, it'd be really unfortunate if we end up with >>>> two or even more diverging specs for storing variants. It just adds more >>>> work for everybody to interop. Parquet would be a great home for this spec >>>> as a neutral project that almost all the other important projects in this >>>> space depend on as the de facto standard for physical data encoding and >>>> storage. So if we can collaborate with the Parquet community and get this >>>> into Parquet to avoid each project building its own spec, that'd be >>>> amazing. >>>> >>>> >>>> >>>> >>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I am one of the main developers implementing Variant in Spark. The >>>>> specification and all the code are currently merged into the >>>>> common/variant >>>>> <https://github.com/apache/spark/tree/master/common/variant> package >>>>> in the Spark repo. >>>>> >>>>> There has been growing interest from other projects (such as Iceberg) >>>>> in supporting Variant, and we think that moving the Variant spec and >>>>> implementation out to a new home might be the best way for all the >>>>> different projects to be able to use and collaborate on Variant. We >>>>> originally put all the Variant code under common/variant with the >>>>> expectation that eventually it would be moved elsewhere. >>>>> >>>>> We are proposing that we move the Variant spec and implementation out >>>>> of the Spark project, to the Parquet project. Spark depends heavily on >>>>> Parquet, and the Variant spec contains a lot of details on the physical >>>>> storage layer, such as shredding. The Parquet project would be a great >>>>> place to standardize the Variant data type, and to enable interoperability >>>>> across many different projects. However, even when we move Variant out, we >>>>> expect to retain the compatibility with the current Spark implementation. >>>>> >>>>> What do people think? There are probably many details we still need to >>>>> figure out in terms of moving the implementation, but at a high-level, >>>>> does >>>>> it make sense to move Variant to Parquet? >>>>> >>>>> I appreciate your feedback! >>>>> >>>>> Thanks, >>>>> Gene >>>>> >>>>