alamb opened a new pull request, #20749:
URL: https://github.com/apache/datafusion/pull/20749

   Draft until:
   - [ ] New tests merge: https://github.com/apache/datafusion/pull/20723
   - [ ] Run performance benchmarks
   
   
   ## Which issue does this PR close?
   
   - Part of #18489 
   - Closes https://github.com/apache/datafusion/pull/20180
   - Closes https://github.com/apache/datafusion/issues/15524
   - Replaces https://github.com/apache/datafusion/pull/20665
   
   ## Rationale for this change
   
   I [want DataFusion to be the fastest paruqet engine on 
ClickBench](https://github.com/apache/datafusion/issues/18489). One of the 
queries where DataFusion is significantly slower is Query 29 which has a very 
strange pattern of many aggregate functions that are offset by a constant:
   
   
https://github.com/apache/datafusion/blob/0ca9d6586a43c323525b2e299448e0f1af4d6195/benchmarks/queries/clickbench/queries/q29.sql#L4
   
   This is not a pattern I have ever seen in a real query, but it seems like 
the engine currently at the top of the ClickBench leaderboard has a special 
case for this pattern. ClickHouse probably does too. See 
   - https://github.com/duckdb/duckdb/pull/15017
   - Discussion on https://github.com/apache/datafusion/issues/15524
   
   Thus I reluctantly conclude that we should have one too. 
   
   ## What changes are included in this PR?
   
   This is an alternate to my first attempt. 
    - https://github.com/apache/datafusion/pull/20665
   
   In particular, since this is such a ClickBench specific rule, I wanted to 
   1. Minimize the downstream API / upgrade impact (aka not change existing 
APIs)
   2. Optimize performance for the case where this rewrite will not apply (most 
times)
   
   1. Add a rewrite `SUM(expr + scalar)` --> `SUM(expr) + scalar*COUNT(expr)` 
   3. Tests for same
   
   Note there are quite a few other ideas to potentially make this more general 
on https://github.com/apache/datafusion/issues/15524 but I am going with the 
simple thing of making it work for the usecase we have in hand (ClickBench)
   
   ## Are these changes tested?
   
   Yes, new tests are added
   
   ## Are there any user-facing changes?
   
   Faster performance
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to