paleolimbot commented on PR #39200: URL: https://github.com/apache/arrow/pull/39200#issuecomment-1854005319
I quite like @pitrou's description of equivalence between a type and its storage, which lets extension type authors get a lot of mileage out of existing internals for simple cases. For example, you probably want `group_by(<some_uuid>)` + aggregate to "just work" and it's unrealistic for extension type authors to remember or define the appropriate internals to make that happen (if it's even possible today). Allowing an implicit or automatic cast to storage seems like a unsafe precedent; however, allowing I can't currently think of an example where an explicit `Cast(<some extension array>, <its own storage type>)` would be inappropriate (maybe that works today, I haven't tried). > Imo it goes against the whole purpose of extension types if the only way to support them is for every kernel to be aware of all possible extension types. I think it is up to extension type authors to decide what *logical* manipulation can and cannot be performed on an array. The extension type purpose (IMO) is that implementations take care of the *physical* manipulations (filter, take, slice, concatenate, read/write files). What I like about Antoine's suggestion is that it makes opting in to more storage behaviour very easy (since many extension types probably want to opt in to some or all storage behaviour) but is safe by default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
