Depending where your Arrow-encoded data is used, either extension
types or generic field metadata are options. We have this problem in
the ADBC Postgres driver, where we can convert *most* Postgres types
to an Arrow type but there are some others where we can't or don't
know or don't implement a conversion. Currently for these we return
opaque binary (the Postgres COPY representation of the value) but put
field metadata so that a consumer can implement a workaround for an
unsupported type. It would be arguably better to have implemented this
as an extension type; however, field metadata felt like less of a
commitment when I first worked on this.

Cheers,

-dewey

On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
<norman.jor...@improving.com.invalid> wrote:
>
> I was using UUID as an example. It looks like extension types covers my 
> original request.
> ________________________________
> From: Felipe Oliveira Carvalho <felipe...@gmail.com>
> Sent: Thursday, April 11, 2024 7:15 AM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: Re: Unsupported/Other Type
>
> The OP used UUID as an example. Would that be enough or the request is for
> a flexible mechanism that allows the creation of one-off nominal types for
> very specific use-cases?
>
> —
> Felipe
>
> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > Yes, JSON and UUID are obvious candidates for new canonical extension
> > types. XML also comes to mind, but I'm not sure there's much of a use
> > case for it.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 10/04/2024 à 22:55, Wes McKinney a écrit :
> > > In the past we have discussed adding a canonical type for UUID and JSON.
> > I
> > > still think this is a good idea and could improve ergonomics in
> > downstream
> > > language bindings (e.g. by exposing JSON querying function or
> > automatically
> > > boxing UUIDs in built-in UUID types, like the Python uuid library). Has
> > > anyone done any work on this to anyone's knowledge?
> > >
> > > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com>
> > > wrote:
> > >
> > >> Hi Norman,
> > >> Arrow has a concept of extension types [1] along with the possibility of
> > >> proposing new canonical extension types [2].  This seems to cover the
> > >> use-cases you mention but I might be misunderstanding?
> > >>
> > >> Thanks,
> > >> Micah
> > >>
> > >> [1]
> > >>
> > >>
> > https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
> > >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
> > >>
> > >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
> > >> <norman.jor...@improving.com.invalid> wrote:
> > >>
> > >>> Problem Description
> > >>>
> > >>> Currently Arrow schemas can only contain columns of types supported by
> > >>> Arrow. In some cases an Arrow schema maps to an external schema. This
> > can
> > >>> result in the Arrow schema not being able to support all the columns
> > from
> > >>> the external schema.
> > >>>
> > >>> Consider an external system that contains a column of type UUID. To
> > model
> > >>> the schema in Arrow, the user has two choices:
> > >>>
> > >>>    1.  Do not include the UUID column in the Arrow schema
> > >>>
> > >>>    2.  Map the column to an existing Arrow type. This will not include
> > the
> > >>> original type information. A UUID can be mapped to a FixedSizeBinary,
> > but
> > >>> consumers of the Arrow schema will be unable to distinguish a
> > >>> FixedSizeBinary field from a UUID field.
> > >>>
> > >>> Possible Solution
> > >>>
> > >>>    *   Add a new type code that represents unsupported types
> > >>>
> > >>>    *   Values for the new type are represented as variable length
> > binary
> > >>>
> > >>> Some drivers can expose data even when they don’t understand the data
> > >>> type. For example, the PostgreSQL driver will return the raw bytes for
> > >>> fields of an unknown type. Using an explicit type lets clients know
> > that
> > >>> they should convert values if they were able to determine the actual
> > data
> > >>> type.
> > >>>
> > >>> Questions
> > >>>
> > >>>    *   What is the impact on existing clients when they encounter
> > fields
> > >> of
> > >>> the unsupported type?
> > >>>
> > >>>    *   Is it safe to assume that all unsupported values can safely be
> > >>> converted to a variable length binary?
> > >>>
> > >>>    *   How can we preserve information about the original type?
> > >>>
> > >>>
> > >>
> > >
> >
> Warning: The sender of this message could not be validated and may not be the 
> actual sender.

Reply via email to