But consider that a user might want to define a
non-canonical HLLSKETCH extension type and make use of Arrow
implementations' features for handling JSON canonical extension type
columns in order to handle HLLSKETCH extension type columns. The spec
currently does not provide any means to enable this. I wonder if we should
consider incorporating something like this into the spec.

For example, maybe the colon character could have the special meaning
"represented as" in extension type names, so that implementations would
recognize "hllsketch:arrow.json" as meaning: a column with extension type
hllsketch, which is represented as in the JSON canonical extension type.

Ian

On Tue, Apr 30, 2024 at 11:51 AM Weston Pace <weston.p...@gmail.com> wrote:

> I think "inheritance" and "composition" are more concerns for
> implementations than they are for spec (I could be wrong here).
>
> So it seems that it would be sufficient to write the HLLSKETCH's canonical
> definition as "this is an extension of the JSON logical type and supports
> all the same storage types" and then allow implementations to use whatever
> inheritance / composition scheme they want to behind the scenes.
>
> On Tue, Apr 30, 2024 at 7:47 AM Matt Topol <zotthewiz...@gmail.com> wrote:
>
> > I think the biggest blocker to doing this is the way that we pass
> extension
> > types through IPC. Extension types are sent as their underlying storage
> > type with metadata key-value pairs of specific keys
> "ARROW:extension:name"
> > and "ARROW:extension:metadata". Since you can't have multiple values for
> > the same key in the metadata, this would prevent the ability to define an
> > extension type in terms of another extension type as you wouldn't be able
> > to include the metadata for the second-level extension part.
> >
> > i.e. you'd be able to have "ARROW:extension:name" => "HLLSKETCH", but you
> > wouldn't be able to *also* have "ARROW:extension:name" => "JSON" for its
> > storage type. So the storage type needs to be a valid core Arrow data
> type
> > for this reason.
> >
> > On Tue, Apr 30, 2024 at 10:16 AM Ian Cook <ianmc...@apache.org> wrote:
> >
> > > The vote on adding a JSON canonical extension type [1] got me
> wondering:
> > Is
> > > it possible to define an extension type that is based on a canonical
> > > extension type? If so, how?
> > >
> > > For example, say I wanted to define a (non-canonical) HLLSKETCH
> extension
> > > type that corresponds to the type that Redshift uses for HyperLogLog
> > > sketches and is represented as JSON [2]. Is there a way to do this by
> > > building on the JSON canonical extension type?
> > >
> > > [1] https://lists.apache.org/thread/4dw3dnz6rjp5wz2240mn299p51d5tvtq
> > > [2]
> https://docs.aws.amazon.com/redshift/latest/dg/r_HLLSKTECH_type.html
> > >
> > > Ian
> > >
> >
>

Reply via email to