There are JSON [1] and UUID [2] PRs open. I don't know about the former (seems to be stuck in review), but I plan to work on the UUID PR this week.
[1] https://github.com/apache/arrow/pull/13901 [2] https://github.com/apache/arrow/pull/37298 On Thu, Apr 11, 2024 at 12:31 AM James Duong <james.du...@improving.com.invalid> wrote: > It’s worth noting that this maps well to the User data type field in the > XdbcTypeInfo APIs for Flight SQL. > > From: David Li <lidav...@apache.org> > Date: Wednesday, April 10, 2024 at 3:23 PM > To: dev@arrow.apache.org <dev@arrow.apache.org> > Subject: Re: Unsupported/Other Type > I think this should be an extension type, yes. > > It could be parametrized on the storage type; the other system might at > least know that one type is based on another (e.g. a user defined type). > Type metadata can be preserved in the extension type's metadata. > > I think it would be good to have standard UUID and JSON extension types. I > don't think anyone is actively working on it. > > On Thu, Apr 11, 2024, at 05:55, Wes McKinney wrote: > > In the past we have discussed adding a canonical type for UUID and JSON. > I > > still think this is a good idea and could improve ergonomics in > downstream > > language bindings (e.g. by exposing JSON querying function or > automatically > > boxing UUIDs in built-in UUID types, like the Python uuid library). Has > > anyone done any work on this to anyone's knowledge? > > > > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > >> Hi Norman, > >> Arrow has a concept of extension types [1] along with the possibility of > >> proposing new canonical extension types [2]. This seems to cover the > >> use-cases you mention but I might be misunderstanding? > >> > >> Thanks, > >> Micah > >> > >> [1] > >> > >> > https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types > >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html > >> > >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan > >> <norman.jor...@improving.com.invalid> wrote: > >> > >> > Problem Description > >> > > >> > Currently Arrow schemas can only contain columns of types supported by > >> > Arrow. In some cases an Arrow schema maps to an external schema. This > can > >> > result in the Arrow schema not being able to support all the columns > from > >> > the external schema. > >> > > >> > Consider an external system that contains a column of type UUID. To > model > >> > the schema in Arrow, the user has two choices: > >> > > >> > 1. Do not include the UUID column in the Arrow schema > >> > > >> > 2. Map the column to an existing Arrow type. This will not include > the > >> > original type information. A UUID can be mapped to a FixedSizeBinary, > but > >> > consumers of the Arrow schema will be unable to distinguish a > >> > FixedSizeBinary field from a UUID field. > >> > > >> > Possible Solution > >> > > >> > * Add a new type code that represents unsupported types > >> > > >> > * Values for the new type are represented as variable length > binary > >> > > >> > Some drivers can expose data even when they don’t understand the data > >> > type. For example, the PostgreSQL driver will return the raw bytes for > >> > fields of an unknown type. Using an explicit type lets clients know > that > >> > they should convert values if they were able to determine the actual > data > >> > type. > >> > > >> > Questions > >> > > >> > * What is the impact on existing clients when they encounter > fields > >> of > >> > the unsupported type? > >> > > >> > * Is it safe to assume that all unsupported values can safely be > >> > converted to a variable length binary? > >> > > >> > * How can we preserve information about the original type? > >> > > >> > > >> >