Thanks for pursuing this! This is one of the questions I had reviewing the
beginnings of Variant support in Parquet C++ and Go. I would love to see
this as a canonical extension type, both for its utility as a type and as
further incentive to strengthen the extension system.

Cheers,

-dewey

On Thu, May 8, 2025 at 9:05 PM Ian Cook <ianmc...@apache.org> wrote:

> As Parquet adds new types including Variant, I think it's important
> for us to (a) preserve the ability to efficiently round-trip the full
> set of types between Arrow and Parquet, and (b) add new Arrow types
> (or canonical extension types) to enable Arrow to continue to flourish
> in its role as the in-memory and on-the-wire counterpart to Parquet
> storage. Thank you Matt for pursuing this.
>
> Ian
>
>
> On Thu, May 8, 2025 at 6:03 PM Matt Topol <zotthewiz...@gmail.com> wrote:
> >
> > Hey All,
> >
> > There's been various discussions occurring on many different thread
> > locations (issues, PRs, and so on)[1][2][3], and more that I haven't
> > linked to, concerning what a canonical Variant Extension Type for
> > Arrow might look like. As I've looked into implementing some things,
> > I've also spoken with members of the Arrow, Iceberg and Parquet
> > communities as to what a good representation for Arrow Variant would
> > be like in order to ensure good support and adoption.
> >
> > I also looked at the ClickHouse variant implementation [4]. The
> > ClickHouse Variant is nearly equivalent to the Arrow Dense Union type,
> > so we don't need to do any extra work there to support it.
> >
> > So, after discussions and looking into the needs for engines and so
> > on, I've iterated and written up a proposal for what a Canonical
> > Variant Extension Type for Arrow could be in a google doc[5]. I'm
> > hoping that this can spark some discussion and comments on the
> > document. If there's relative consensus on it, then I'll work on
> > creating some implementations of it that I can use to formally propose
> > the addition to the Canonical Extensions.
> >
> > Please take a read and leave comments on the google doc or on this
> > thread. Thanks everyone!
> >
> > --Matt
> >
> > [1]: https://github.com/apache/arrow-rs/issues/7063
> > [2]: https://github.com/apache/arrow/issues/45937
> > [3]: https://github.com/apache/arrow/pull/45375#issuecomment-2649807352
> > [4]:
> https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse
> > [5]:
> https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing
>

Reply via email to