paleolimbot commented on PR #39200:
URL: https://github.com/apache/arrow/pull/39200#issuecomment-1854005319

   I quite like @pitrou's description of equivalence between a type and its 
storage, which lets extension type authors get a lot of mileage out of existing 
internals for simple cases. For example, you probably want 
`group_by(<some_uuid>)` + aggregate to "just work" and it's unrealistic for 
extension type authors to remember or define the appropriate internals to make 
that happen (if it's even possible today).
   
   Allowing an implicit or automatic cast to storage seems like a unsafe 
precedent; however, allowing I can't currently think of an example where an 
explicit `Cast(<some extension array>, <its own storage type>)` would be 
inappropriate (maybe that works today, I haven't tried).
   
   > Imo it goes against the whole purpose of extension types if the only way 
to support them is for every kernel to be aware of all possible extension types.
   
   I think it is up to extension type authors to decide what *logical* 
manipulation can and cannot be performed on an array. The extension type 
purpose (IMO) is that implementations take care of the *physical* manipulations 
(filter, take, slice, concatenate, read/write files). What I like about 
Antoine's suggestion is that it makes opting in to more storage behaviour very 
easy (since many extension types probably want to opt in to some or all storage 
behaviour) but is safe by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to