seddonm1 commented on pull request #9376: URL: https://github.com/apache/arrow/pull/9376#issuecomment-778716899
> I wonder if it is possible to get the best of both worlds, by extending the `Array` trait slightly and having a subclass of `Array` which denotes scalars, like `ScalarArray` which do not need `Buffer` storage and just represents constant scalars. This way, functions would only need to deal with Array, but can recognize this subclass `ScalarArray` and do optimizations that way. This is a very half-formed thought at the moment. The train of thought is just what if the `Array` was not strictly a buffer-based representation but just a way to access columnar data, and in certain cases represents scalars. I also have many instances where knowing the `ScalarArray` vs `Array` would provide huge performance opportunities in the big PR implement Postgres functions : https://github.com/apache/arrow/pull/9243 - especially for things like Regex matching. It would be relatively trivial to implement if it could be pattern matched. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
