alamb commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2372367122
The idea if dynamically switching to different physical encodings is neat, but I think it presumes all operators / functions can handle any of the different encodings (RLE, Dict, String, etc) which is not the case today > Benefit? I think another benefit of the current type system is that the implementations of functions (and operators, etc) declare what types of arrays (physical encodings) they have specializations for and then the optimizers and analyzers ensure that the types lineup and try to minimize conversions at runtime. For example, in a query like this that computes several operations on a column ```sql SELECT COUNT(*), substr(url, 1, 4), regexp_match(url, '%google.com%'` FROM ... GROUP BY url ``` If we change the hash group by operator to return `StringViewArray` for certain grouping operations when the input it a `StringArray`, and the column is used twice -- once in `substr` and once in `regexp_match` which don't have specialized code paths for `StringViewArray` we have to be careful the array will not be cast to `StringArray` twice > In my mind, at some point we need to build the arrow's Array. How do we build it if we don't know what type is it? I think @jayzhan211 's is the key question in my mind. At some point you need to get the actual array and have code that operates on exactly that type. Figuring out at what point this conversion happens is important. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org