Problem Description

Currently Arrow schemas can only contain columns of types supported by Arrow. 
In some cases an Arrow schema maps to an external schema. This can result in 
the Arrow schema not being able to support all the columns from the external 
schema.

Consider an external system that contains a column of type UUID. To model the 
schema in Arrow, the user has two choices:

  1.  Do not include the UUID column in the Arrow schema

  2.  Map the column to an existing Arrow type. This will not include the 
original type information. A UUID can be mapped to a FixedSizeBinary, but 
consumers of the Arrow schema will be unable to distinguish a FixedSizeBinary 
field from a UUID field.

Possible Solution

  *   Add a new type code that represents unsupported types

  *   Values for the new type are represented as variable length binary

Some drivers can expose data even when they don’t understand the data type. For 
example, the PostgreSQL driver will return the raw bytes for fields of an 
unknown type. Using an explicit type lets clients know that they should convert 
values if they were able to determine the actual data type.

Questions

  *   What is the impact on existing clients when they encounter fields of the 
unsupported type?

  *   Is it safe to assume that all unsupported values can safely be converted 
to a variable length binary?

  *   How can we preserve information about the original type?

Reply via email to