Re: Unsupported/Other Type

Rok Mihevc Wed, 10 Apr 2024 15:47:10 -0700

There are JSON [1] and UUID [2] PRs open. I don't know about the former
(seems to be stuck in review), but I plan to work on the UUID PR this week.


[1] https://github.com/apache/arrow/pull/13901
[2] https://github.com/apache/arrow/pull/37298

On Thu, Apr 11, 2024 at 12:31 AM James Duong
<james.du...@improving.com.invalid> wrote:

> It’s worth noting that this maps well to the User data type field in the
> XdbcTypeInfo APIs for Flight SQL.
>
> From: David Li <lidav...@apache.org>
> Date: Wednesday, April 10, 2024 at 3:23 PM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: Re: Unsupported/Other Type
> I think this should be an extension type, yes.
>
> It could be parametrized on the storage type; the other system might at
> least know that one type is based on another (e.g. a user defined type).
> Type metadata can be preserved in the extension type's metadata.
>
> I think it would be good to have standard UUID and JSON extension types. I
> don't think anyone is actively working on it.
>
> On Thu, Apr 11, 2024, at 05:55, Wes McKinney wrote:
> > In the past we have discussed adding a canonical type for UUID and JSON.
> I
> > still think this is a good idea and could improve ergonomics in
> downstream
> > language bindings (e.g. by exposing JSON querying function or
> automatically
> > boxing UUIDs in built-in UUID types, like the Python uuid library). Has
> > anyone done any work on this to anyone's knowledge?
> >
> > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> >> Hi Norman,
> >> Arrow has a concept of extension types [1] along with the possibility of
> >> proposing new canonical extension types [2].  This seems to cover the
> >> use-cases you mention but I might be misunderstanding?
> >>
> >> Thanks,
> >> Micah
> >>
> >> [1]
> >>
> >>
> https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
> >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
> >>
> >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
> >> <norman.jor...@improving.com.invalid> wrote:
> >>
> >> > Problem Description
> >> >
> >> > Currently Arrow schemas can only contain columns of types supported by
> >> > Arrow. In some cases an Arrow schema maps to an external schema. This
> can
> >> > result in the Arrow schema not being able to support all the columns
> from
> >> > the external schema.
> >> >
> >> > Consider an external system that contains a column of type UUID. To
> model
> >> > the schema in Arrow, the user has two choices:
> >> >
> >> >   1.  Do not include the UUID column in the Arrow schema
> >> >
> >> >   2.  Map the column to an existing Arrow type. This will not include
> the
> >> > original type information. A UUID can be mapped to a FixedSizeBinary,
> but
> >> > consumers of the Arrow schema will be unable to distinguish a
> >> > FixedSizeBinary field from a UUID field.
> >> >
> >> > Possible Solution
> >> >
> >> >   *   Add a new type code that represents unsupported types
> >> >
> >> >   *   Values for the new type are represented as variable length
> binary
> >> >
> >> > Some drivers can expose data even when they don’t understand the data
> >> > type. For example, the PostgreSQL driver will return the raw bytes for
> >> > fields of an unknown type. Using an explicit type lets clients know
> that
> >> > they should convert values if they were able to determine the actual
> data
> >> > type.
> >> >
> >> > Questions
> >> >
> >> >   *   What is the impact on existing clients when they encounter
> fields
> >> of
> >> > the unsupported type?
> >> >
> >> >   *   Is it safe to assume that all unsupported values can safely be
> >> > converted to a variable length binary?
> >> >
> >> >   *   How can we preserve information about the original type?
> >> >
> >> >
> >>
>

Re: Unsupported/Other Type

Reply via email to