Hello all,

Recently, a PR to arrow c++ [1] was opened to allow implicit casting from
any extension type to its storage type in acero. This raises questions
about the validity of applying operations to an extension array's storage.
For example, some extension type authors may intend different ordering for
arrays of their new type than would be applied to the array's storage or
may not intend for the type to participate in arithmetic even though its
storage could.

Suggestions/observations from discussion on that PR included:
- Extension types could provide general semantic description of storage
type equivalence [2], so that a flag on the extension type enables ordering
by storage but disables arithmetic on it
- Compute functions or kernels could be augmented with a filter declaring
which extension types are supported [3].
- Currently arrow-rs considers extension types metadata only [4], so all
kernels treat extension arrays equivalently to their storage.
- Currently arrow c++ only supports explicitly casting from an extension
type to its storage (and from storage to ext), so any operation can be
performed on an extension array's storage but it requires opting in.

Sincerely,
Ben Kietzman

[1] https://github.com/apache/arrow/pull/39200
[2] https://github.com/apache/arrow/pull/39200#issuecomment-1852307954
[3] https://github.com/apache/arrow/pull/39200#issuecomment-1852676161
[4] https://github.com/apache/arrow/pull/39200#issuecomment-1852881651

Reply via email to