I think this would be a great move to encourage all sorts of engines and table formats to take advantage of variant type and make sure it remains compatible between all those systems.
I strongly support this, Russ On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <fo...@apache.org> wrote: > Hey everyone, > > I agree the Parquet project is a good place to host and evolve the spec > (we could store it in parquet-variant?). We would need to align this with > the Parquet project. Anyway, I'm familiar both with Iceberg and Parquet and > happy to help where needed. > > Kind regards, > Fokko > > > Op ma 19 aug 2024 om 16:36 schreef Reynold Xin <r...@databricks.com.invalid > >: > >> As I said on dev@iceberg, it'd be really unfortunate if we end up with >> two or even more diverging specs for storing variants. It just adds more >> work for everybody to interop. Parquet would be a great home for this spec >> as a neutral project that almost all the other important projects in this >> space depend on as the de facto standard for physical data encoding and >> storage. So if we can collaborate with the Parquet community and get this >> into Parquet to avoid each project building its own spec, that'd be amazing. >> >> >> >> >> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com> wrote: >> >>> Hi all, >>> >>> I am one of the main developers implementing Variant in Spark. The >>> specification and all the code are currently merged into the >>> common/variant >>> <https://github.com/apache/spark/tree/master/common/variant> package in >>> the Spark repo. >>> >>> There has been growing interest from other projects (such as Iceberg) in >>> supporting Variant, and we think that moving the Variant spec and >>> implementation out to a new home might be the best way for all the >>> different projects to be able to use and collaborate on Variant. We >>> originally put all the Variant code under common/variant with the >>> expectation that eventually it would be moved elsewhere. >>> >>> We are proposing that we move the Variant spec and implementation out of >>> the Spark project, to the Parquet project. Spark depends heavily on >>> Parquet, and the Variant spec contains a lot of details on the physical >>> storage layer, such as shredding. The Parquet project would be a great >>> place to standardize the Variant data type, and to enable interoperability >>> across many different projects. However, even when we move Variant out, we >>> expect to retain the compatibility with the current Spark implementation. >>> >>> What do people think? There are probably many details we still need to >>> figure out in terms of moving the implementation, but at a high-level, does >>> it make sense to move Variant to Parquet? >>> >>> I appreciate your feedback! >>> >>> Thanks, >>> Gene >>> >>