andygrove commented on PR #789: URL: https://github.com/apache/datafusion-comet/pull/789#issuecomment-2276484812
> > is it possible to post any kind of benchmarks to show the improvements? > > I think the expected improvements from this are based on a heuristic that it beneficial to continue to use dictionary encoding whenever possible. So the improvement will depend on what other expressions are used in combination with it. Therefore I think would be possible to construct a benchmark to show almost any or no benefit. Do you have something more specific benchmark in mind or what worry do we want to resolve with it? I'd like to have a go at proving the benefit of this PR (to help with my own understanding of how dictionary types can affect performance). I am thinking of running something like this: ```sql SELECT struct(foo, bar) as s FROM tbl WHERE s.foo RLIKE '^[A-Z]{1}' ``` My hypothesis is that this will be faster if we preserve the dictionary type because 1) the regexp can be evaluated on fewer rows, and 2) we avoid the cost of unpacking the dictionary in the first place -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org