bkietz commented on PR #39200:
URL: https://github.com/apache/arrow/pull/39200#issuecomment-1852676161
> I would suggest we perhaps need a more general semantic description of
storage type equivalence.
Another way to handle this would be with a property on kernels (or even
functions) instead of on ExtensionType:
```c++
class ARROW_EXPORT InputType {
public:
bool accepts_extension_types_if_storage_matches = false;
// ...
// ...
filter_kernel->signature->in_types()[0].accepts_extension_types_if_storage_matches
= true;
```
This would have the effect of disabling operation on extension arrays'
storage by default, but allow enabling it per kernel or function. (The filter
for accepted extension types could of course also be more articulated than a
simple `bool`.)
I think this would put configurability where it needs to be since the
specific categories selection, equality comparison, and arithmetic have
differing levels of difficulty in support:
- I can't think of a type which would not be selectable by operating on its
storage unless the semantics of an array slot depended on its position in the
array or on its neighbors somehow, which seems like something incompatible with
arrow arrays anyway.
- Equality comparison for extension types would have to be superset of
equality comparison for their storage; if the storage slots are identical then
whatever the extension type the corresponding slot in the extension array must
be identical also. However it would be possible to define an extension of fixed
size list wherein the semantic value of a slot is the sum of list elements from
the storage's slot, in which case `[1, 2]` would be equal to `[1, 2]` but also
equal to `[3, 0]`.
- Arithmetic is much more complex since even if a type supports addition it
might not support multiplication and it might only support an addend of a
specific other type (as timestamps can be added to durations but not to other
timestamps).
Given that supporting various operations on the stored types is so nuanced,
it seems we'd need to reserve handling of it for a system which can express
those nuances - like kernel dispatch.
---
Another solution would be to formalize what we (de facto) have done thus
far: allow casting to/from storage types, allowing operation on storage only
when explicitly requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]