Hey All,

There's been various discussions occurring on many different thread
locations (issues, PRs, and so on)[1][2][3], and more that I haven't
linked to, concerning what a canonical Variant Extension Type for
Arrow might look like. As I've looked into implementing some things,
I've also spoken with members of the Arrow, Iceberg and Parquet
communities as to what a good representation for Arrow Variant would
be like in order to ensure good support and adoption.

I also looked at the ClickHouse variant implementation [4]. The
ClickHouse Variant is nearly equivalent to the Arrow Dense Union type,
so we don't need to do any extra work there to support it.

So, after discussions and looking into the needs for engines and so
on, I've iterated and written up a proposal for what a Canonical
Variant Extension Type for Arrow could be in a google doc[5]. I'm
hoping that this can spark some discussion and comments on the
document. If there's relative consensus on it, then I'll work on
creating some implementations of it that I can use to formally propose
the addition to the Canonical Extensions.

Please take a read and leave comments on the google doc or on this
thread. Thanks everyone!

--Matt

[1]: https://github.com/apache/arrow-rs/issues/7063
[2]: https://github.com/apache/arrow/issues/45937
[3]: https://github.com/apache/arrow/pull/45375#issuecomment-2649807352
[4]: https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse
[5]: 
https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing

Reply via email to