Hi all, I am one of the main developers implementing Variant in Spark. The specification and all the code are currently merged into the common/variant <https://github.com/apache/spark/tree/master/common/variant> package in the Spark repo.
There has been growing interest from other projects (such as Iceberg) in supporting Variant, and we think that moving the Variant spec and implementation out to a new home might be the best way for all the different projects to be able to use and collaborate on Variant. We originally put all the Variant code under common/variant with the expectation that eventually it would be moved elsewhere. We are proposing that we move the Variant spec and implementation out of the Spark project, to the Parquet project. Spark depends heavily on Parquet, and the Variant spec contains a lot of details on the physical storage layer, such as shredding. The Parquet project would be a great place to standardize the Variant data type, and to enable interoperability across many different projects. However, even when we move Variant out, we expect to retain the compatibility with the current Spark implementation. What do people think? There are probably many details we still need to figure out in terms of moving the implementation, but at a high-level, does it make sense to move Variant to Parquet? I appreciate your feedback! Thanks, Gene