bkietz commented on PR #39200:
URL: https://github.com/apache/arrow/pull/39200#issuecomment-1852676161

   > I would suggest we perhaps need a more general semantic description of 
storage type equivalence.
   
   Another way to handle this would be with a property on kernels (or even 
functions) instead of on ExtensionType:
   
   ```c++
   class ARROW_EXPORT InputType {
    public:
     bool accepts_extension_types_if_storage_matches = false;
     // ...
   
       // ...
       
filter_kernel->signature->in_types()[0].accepts_extension_types_if_storage_matches
 = true;
   ```
   
   This would have the effect of disabling operation on extension arrays' 
storage by default, but allow enabling it per kernel or function. (The filter 
for accepted extension types could of course also be more articulated than a 
simple `bool`.)
   
   I think this would put configurability where it needs to be since the 
specific categories selection, equality comparison, and arithmetic have 
differing levels of difficulty in support:
   - I can't think of a type which would not be selectable by operating on its 
storage unless the semantics of an array slot depended on its position in the 
array or on its neighbors somehow, which seems like something incompatible with 
arrow arrays anyway.
   - Equality comparison for extension types would have to be superset of 
equality comparison for their storage; if the storage slots are identical then 
whatever the extension type the corresponding slot in the extension array must 
be identical also. However it would be possible to define an extension of fixed 
size list wherein the semantic value of a slot is the sum of list elements from 
the storage's slot, in which case `[1, 2]` would be equal to `[1, 2]` but also 
equal to `[3, 0]`.
   - Arithmetic is much more complex since even if a type supports addition it 
might not support multiplication and it might only support an addend of a 
specific other type (as timestamps can be added to durations but not to other 
timestamps).
   
   Given that supporting various operations on the stored types is so nuanced, 
it seems we'd need to reserve handling of it for a system which can express 
those nuances - like kernel dispatch.
   
   ---
   
   Another solution would be to formalize what we (de facto) have done thus 
far: allow casting to/from storage types, allowing operation on storage only 
when explicitly requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to