As Parquet adds new types including Variant, I think it's important for us to (a) preserve the ability to efficiently round-trip the full set of types between Arrow and Parquet, and (b) add new Arrow types (or canonical extension types) to enable Arrow to continue to flourish in its role as the in-memory and on-the-wire counterpart to Parquet storage. Thank you Matt for pursuing this.
Ian On Thu, May 8, 2025 at 6:03 PM Matt Topol <zotthewiz...@gmail.com> wrote: > > Hey All, > > There's been various discussions occurring on many different thread > locations (issues, PRs, and so on)[1][2][3], and more that I haven't > linked to, concerning what a canonical Variant Extension Type for > Arrow might look like. As I've looked into implementing some things, > I've also spoken with members of the Arrow, Iceberg and Parquet > communities as to what a good representation for Arrow Variant would > be like in order to ensure good support and adoption. > > I also looked at the ClickHouse variant implementation [4]. The > ClickHouse Variant is nearly equivalent to the Arrow Dense Union type, > so we don't need to do any extra work there to support it. > > So, after discussions and looking into the needs for engines and so > on, I've iterated and written up a proposal for what a Canonical > Variant Extension Type for Arrow could be in a google doc[5]. I'm > hoping that this can spark some discussion and comments on the > document. If there's relative consensus on it, then I'll work on > creating some implementations of it that I can use to formally propose > the addition to the Canonical Extensions. > > Please take a read and leave comments on the google doc or on this > thread. Thanks everyone! > > --Matt > > [1]: https://github.com/apache/arrow-rs/issues/7063 > [2]: https://github.com/apache/arrow/issues/45937 > [3]: https://github.com/apache/arrow/pull/45375#issuecomment-2649807352 > [4]: https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse > [5]: > https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing