Re: Unsupported/Other Type

Antoine Pitrou Thu, 11 Apr 2024 07:24:43 -0700

One-off nominal types can already be created as application-specificextension types.The specific thing about UUID, JSON and a couple other types is thatthey exist in many systems already, so a standardized way of conveyingthem with Arrow would enhance interoperation between all these systems.


Regards

Antoine.


Le 11/04/2024 à 16:15, Felipe Oliveira Carvalho a écrit :

The OP used UUID as an example. Would that be enough or the request is for
a flexible mechanism that allows the creation of one-off nominal types for
very specific use-cases?

—
Felipe

On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou <[email protected]> wrote:


Yes, JSON and UUID are obvious candidates for new canonical extension
types. XML also comes to mind, but I'm not sure there's much of a use
case for it.

Regards

Antoine.


Le 10/04/2024 à 22:55, Wes McKinney a écrit :

In the past we have discussed adding a canonical type for UUID and JSON.

still think this is a good idea and could improve ergonomics in

downstream

language bindings (e.g. by exposing JSON querying function or

automatically

boxing UUIDs in built-in UUID types, like the Python uuid library). Has
anyone done any work on this to anyone's knowledge?

On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield <[email protected]>
wrote:

Hi Norman,
Arrow has a concept of extension types [1] along with the possibility of
proposing new canonical extension types [2].  This seems to cover the
use-cases you mention but I might be misunderstanding?

Thanks,
Micah

[1]

https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types

[2] https://arrow.apache.org/docs/format/CanonicalExtensions.html

On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
<[email protected]> wrote:

Problem Description

Currently Arrow schemas can only contain columns of types supported by
Arrow. In some cases an Arrow schema maps to an external schema. This

can

result in the Arrow schema not being able to support all the columns

from

the external schema.

Consider an external system that contains a column of type UUID. To

model

the schema in Arrow, the user has two choices:

    1.  Do not include the UUID column in the Arrow schema

    2.  Map the column to an existing Arrow type. This will not include

the

original type information. A UUID can be mapped to a FixedSizeBinary,

but

consumers of the Arrow schema will be unable to distinguish a
FixedSizeBinary field from a UUID field.

Possible Solution

    *   Add a new type code that represents unsupported types

    *   Values for the new type are represented as variable length

binary


Some drivers can expose data even when they don’t understand the data
type. For example, the PostgreSQL driver will return the raw bytes for
fields of an unknown type. Using an explicit type lets clients know

that

they should convert values if they were able to determine the actual

data

type.

Questions

    *   What is the impact on existing clients when they encounter

fields

of

the unsupported type?

    *   Is it safe to assume that all unsupported values can safely be
converted to a variable length binary?

    *   How can we preserve information about the original type?

Re: Unsupported/Other Type

Reply via email to