andygrove commented on PR #789:
URL: https://github.com/apache/datafusion-comet/pull/789#issuecomment-2276484812

   > > is it possible to post any kind of benchmarks to show the improvements?
   > 
   > I think the expected improvements from this are based on a heuristic that 
it beneficial to continue to use dictionary encoding whenever possible. So the 
improvement will depend on what other expressions are used in combination with 
it. Therefore I think would be possible to construct a benchmark to show almost 
any or no benefit. Do you have something more specific benchmark in mind or 
what worry do we want to resolve with it?
   
   I'd like to have a go at proving the benefit of this PR (to help with my own 
understanding of how dictionary types can affect performance). I am thinking of 
running something like this:
   
   ```sql
   SELECT struct(foo, bar) as s FROM tbl WHERE s.foo RLIKE '^[A-Z]{1}'
   ```
   
   My hypothesis is that this will be faster if we preserve the dictionary type 
because 1) the regexp can be evaluated on fewer rows, and 2) we avoid the cost 
of unpacking the dictionary in the first place


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to