I don't think there is any current barrier to using implementation
features of one extension type to help with another. In Python, for
example, one might be able to do:

class GeoJSONExtensionType(pa.ExtensionType):

    def __init__(self):
        self._json_ext = pa.JSONExtensionType()

    def some_action(self):
        return self._json_ext.some_action()

One could do something similar with the Array/Scalar classes. I am not
sure there is anything "automatic" that any current implementation
would be able to offer even if this information were machine
parseable. The only thing I can think of is that implementations like
Arrow C++ that aggressively drop extension information might be able
to drop the extension type by assigning a different one; however, I am
not sure that it would be useful enough to ever be implemented.

-dewey

On Tue, Apr 30, 2024 at 1:31 PM Ian Cook <ianmc...@apache.org> wrote:
>
> But consider that a user might want to define a
> non-canonical HLLSKETCH extension type and make use of Arrow
> implementations' features for handling JSON canonical extension type
> columns in order to handle HLLSKETCH extension type columns. The spec
> currently does not provide any means to enable this. I wonder if we should
> consider incorporating something like this into the spec.
>
> For example, maybe the colon character could have the special meaning
> "represented as" in extension type names, so that implementations would
> recognize "hllsketch:arrow.json" as meaning: a column with extension type
> hllsketch, which is represented as in the JSON canonical extension type.
>
> Ian
>
> On Tue, Apr 30, 2024 at 11:51 AM Weston Pace <weston.p...@gmail.com> wrote:
>
> > I think "inheritance" and "composition" are more concerns for
> > implementations than they are for spec (I could be wrong here).
> >
> > So it seems that it would be sufficient to write the HLLSKETCH's canonical
> > definition as "this is an extension of the JSON logical type and supports
> > all the same storage types" and then allow implementations to use whatever
> > inheritance / composition scheme they want to behind the scenes.
> >
> > On Tue, Apr 30, 2024 at 7:47 AM Matt Topol <zotthewiz...@gmail.com> wrote:
> >
> > > I think the biggest blocker to doing this is the way that we pass
> > extension
> > > types through IPC. Extension types are sent as their underlying storage
> > > type with metadata key-value pairs of specific keys
> > "ARROW:extension:name"
> > > and "ARROW:extension:metadata". Since you can't have multiple values for
> > > the same key in the metadata, this would prevent the ability to define an
> > > extension type in terms of another extension type as you wouldn't be able
> > > to include the metadata for the second-level extension part.
> > >
> > > i.e. you'd be able to have "ARROW:extension:name" => "HLLSKETCH", but you
> > > wouldn't be able to *also* have "ARROW:extension:name" => "JSON" for its
> > > storage type. So the storage type needs to be a valid core Arrow data
> > type
> > > for this reason.
> > >
> > > On Tue, Apr 30, 2024 at 10:16 AM Ian Cook <ianmc...@apache.org> wrote:
> > >
> > > > The vote on adding a JSON canonical extension type [1] got me
> > wondering:
> > > Is
> > > > it possible to define an extension type that is based on a canonical
> > > > extension type? If so, how?
> > > >
> > > > For example, say I wanted to define a (non-canonical) HLLSKETCH
> > extension
> > > > type that corresponds to the type that Redshift uses for HyperLogLog
> > > > sketches and is represented as JSON [2]. Is there a way to do this by
> > > > building on the JSON canonical extension type?
> > > >
> > > > [1] https://lists.apache.org/thread/4dw3dnz6rjp5wz2240mn299p51d5tvtq
> > > > [2]
> > https://docs.aws.amazon.com/redshift/latest/dg/r_HLLSKTECH_type.html
> > > >
> > > > Ian
> > > >
> > >
> >

Reply via email to