alamb commented on issue #11513:
URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2372367122

   The idea if dynamically switching to different physical encodings is neat, 
but I think it presumes all operators / functions can handle any of the 
different encodings (RLE, Dict, String, etc) which is not the case today
   
   > Benefit?
   
   I think another benefit of the current type system is that the 
implementations of functions (and operators, etc) declare what types of arrays 
(physical encodings) they have specializations for and then the optimizers and 
analyzers ensure that the types lineup and try to minimize conversions at 
runtime.
   
   For example, in a query like this that computes several operations on a 
column
   
   ```sql
   SELECT COUNT(*), substr(url, 1, 4), regexp_match(url, '%google.com%'`
   FROM ...
   GROUP BY url
   ```
   
   If we change the hash group by operator to return `StringViewArray` for 
certain grouping operations when the input it a `StringArray`, and the column 
is used twice -- once in `substr` and once in `regexp_match` which don't have 
specialized code paths for `StringViewArray` we have to be careful the array 
will not be cast to `StringArray` twice
   
   > In my mind, at some point we need to build the arrow's Array. How do we 
build it if we don't know what type is it?
   
   I think @jayzhan211 's is the key question in my mind. At some point you 
need to get the actual array and have code that operates on exactly that type. 
Figuring out at what point this conversion happens is important.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to