Thanks for pursuing this! This is one of the questions I had reviewing the beginnings of Variant support in Parquet C++ and Go. I would love to see this as a canonical extension type, both for its utility as a type and as further incentive to strengthen the extension system.
Cheers, -dewey On Thu, May 8, 2025 at 9:05 PM Ian Cook <ianmc...@apache.org> wrote: > As Parquet adds new types including Variant, I think it's important > for us to (a) preserve the ability to efficiently round-trip the full > set of types between Arrow and Parquet, and (b) add new Arrow types > (or canonical extension types) to enable Arrow to continue to flourish > in its role as the in-memory and on-the-wire counterpart to Parquet > storage. Thank you Matt for pursuing this. > > Ian > > > On Thu, May 8, 2025 at 6:03 PM Matt Topol <zotthewiz...@gmail.com> wrote: > > > > Hey All, > > > > There's been various discussions occurring on many different thread > > locations (issues, PRs, and so on)[1][2][3], and more that I haven't > > linked to, concerning what a canonical Variant Extension Type for > > Arrow might look like. As I've looked into implementing some things, > > I've also spoken with members of the Arrow, Iceberg and Parquet > > communities as to what a good representation for Arrow Variant would > > be like in order to ensure good support and adoption. > > > > I also looked at the ClickHouse variant implementation [4]. The > > ClickHouse Variant is nearly equivalent to the Arrow Dense Union type, > > so we don't need to do any extra work there to support it. > > > > So, after discussions and looking into the needs for engines and so > > on, I've iterated and written up a proposal for what a Canonical > > Variant Extension Type for Arrow could be in a google doc[5]. I'm > > hoping that this can spark some discussion and comments on the > > document. If there's relative consensus on it, then I'll work on > > creating some implementations of it that I can use to formally propose > > the addition to the Canonical Extensions. > > > > Please take a read and leave comments on the google doc or on this > > thread. Thanks everyone! > > > > --Matt > > > > [1]: https://github.com/apache/arrow-rs/issues/7063 > > [2]: https://github.com/apache/arrow/issues/45937 > > [3]: https://github.com/apache/arrow/pull/45375#issuecomment-2649807352 > > [4]: > https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse > > [5]: > https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing >