I don't think there is any current barrier to using implementation features of one extension type to help with another. In Python, for example, one might be able to do:
class GeoJSONExtensionType(pa.ExtensionType): def __init__(self): self._json_ext = pa.JSONExtensionType() def some_action(self): return self._json_ext.some_action() One could do something similar with the Array/Scalar classes. I am not sure there is anything "automatic" that any current implementation would be able to offer even if this information were machine parseable. The only thing I can think of is that implementations like Arrow C++ that aggressively drop extension information might be able to drop the extension type by assigning a different one; however, I am not sure that it would be useful enough to ever be implemented. -dewey On Tue, Apr 30, 2024 at 1:31 PM Ian Cook <ianmc...@apache.org> wrote: > > But consider that a user might want to define a > non-canonical HLLSKETCH extension type and make use of Arrow > implementations' features for handling JSON canonical extension type > columns in order to handle HLLSKETCH extension type columns. The spec > currently does not provide any means to enable this. I wonder if we should > consider incorporating something like this into the spec. > > For example, maybe the colon character could have the special meaning > "represented as" in extension type names, so that implementations would > recognize "hllsketch:arrow.json" as meaning: a column with extension type > hllsketch, which is represented as in the JSON canonical extension type. > > Ian > > On Tue, Apr 30, 2024 at 11:51 AM Weston Pace <weston.p...@gmail.com> wrote: > > > I think "inheritance" and "composition" are more concerns for > > implementations than they are for spec (I could be wrong here). > > > > So it seems that it would be sufficient to write the HLLSKETCH's canonical > > definition as "this is an extension of the JSON logical type and supports > > all the same storage types" and then allow implementations to use whatever > > inheritance / composition scheme they want to behind the scenes. > > > > On Tue, Apr 30, 2024 at 7:47 AM Matt Topol <zotthewiz...@gmail.com> wrote: > > > > > I think the biggest blocker to doing this is the way that we pass > > extension > > > types through IPC. Extension types are sent as their underlying storage > > > type with metadata key-value pairs of specific keys > > "ARROW:extension:name" > > > and "ARROW:extension:metadata". Since you can't have multiple values for > > > the same key in the metadata, this would prevent the ability to define an > > > extension type in terms of another extension type as you wouldn't be able > > > to include the metadata for the second-level extension part. > > > > > > i.e. you'd be able to have "ARROW:extension:name" => "HLLSKETCH", but you > > > wouldn't be able to *also* have "ARROW:extension:name" => "JSON" for its > > > storage type. So the storage type needs to be a valid core Arrow data > > type > > > for this reason. > > > > > > On Tue, Apr 30, 2024 at 10:16 AM Ian Cook <ianmc...@apache.org> wrote: > > > > > > > The vote on adding a JSON canonical extension type [1] got me > > wondering: > > > Is > > > > it possible to define an extension type that is based on a canonical > > > > extension type? If so, how? > > > > > > > > For example, say I wanted to define a (non-canonical) HLLSKETCH > > extension > > > > type that corresponds to the type that Redshift uses for HyperLogLog > > > > sketches and is represented as JSON [2]. Is there a way to do this by > > > > building on the JSON canonical extension type? > > > > > > > > [1] https://lists.apache.org/thread/4dw3dnz6rjp5wz2240mn299p51d5tvtq > > > > [2] > > https://docs.aws.amazon.com/redshift/latest/dg/r_HLLSKTECH_type.html > > > > > > > > Ian > > > > > > > > >