Hey All, There's been various discussions occurring on many different thread locations (issues, PRs, and so on)[1][2][3], and more that I haven't linked to, concerning what a canonical Variant Extension Type for Arrow might look like. As I've looked into implementing some things, I've also spoken with members of the Arrow, Iceberg and Parquet communities as to what a good representation for Arrow Variant would be like in order to ensure good support and adoption.
I also looked at the ClickHouse variant implementation [4]. The ClickHouse Variant is nearly equivalent to the Arrow Dense Union type, so we don't need to do any extra work there to support it. So, after discussions and looking into the needs for engines and so on, I've iterated and written up a proposal for what a Canonical Variant Extension Type for Arrow could be in a google doc[5]. I'm hoping that this can spark some discussion and comments on the document. If there's relative consensus on it, then I'll work on creating some implementations of it that I can use to formally propose the addition to the Canonical Extensions. Please take a read and leave comments on the google doc or on this thread. Thanks everyone! --Matt [1]: https://github.com/apache/arrow-rs/issues/7063 [2]: https://github.com/apache/arrow/issues/45937 [3]: https://github.com/apache/arrow/pull/45375#issuecomment-2649807352 [4]: https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse [5]: https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing