alamb commented on PR #11627:
URL: https://github.com/apache/datafusion/pull/11627#issuecomment-2254523568

   I also tried out Q32 (that has AVG so can't use this optimization yet) but 
removed the `AVG` and set target partitions to something silly. I see this PR 
making a substantial difference (6s vs 7s)
   
   ### 1000 partitions, this PR
   
   ```shell
   andrewlamb@Andrews-MacBook-Pro-2:~/Downloads$ ./datafusion-cli-skip-partial 
-c "set datafusion.execution.target_partitions = 1000; SELECT \"WatchID\", 
\"ClientIP\", COUNT(*) AS c, SUM(\"IsRefresh\") FROM 'hits.parquet' GROUP BY 
\"WatchID\", \"ClientIP\" ORDER BY c DESC LIMIT 10;"
   
   Elapsed 0.001 seconds.
   
   +---------------------+-------------+---+-----------------------------+
   | WatchID             | ClientIP    | c | sum(hits.parquet.IsRefresh) |
   +---------------------+-------------+---+-----------------------------+
   | 7904046282518428963 | 1509330109  | 2 | 0                           |
   | 8566928176839891583 | -1402644643 | 2 | 0                           |
   | 6655575552203051303 | 1611957945  | 2 | 0                           |
   | 7224410078130478461 | -776509581  | 2 | 0                           |
   | 9102894172721185728 | 1489622498  | 1 | 1                           |
   | 8964981845434484863 | 1822336830  | 1 | 0                           |
   | 6991883311913569583 | -745122562  | 1 | 0                           |
   | 6787783378461221127 | -506600142  | 1 | 0                           |
   | 6042898921955304644 | 2054220936  | 1 | 0                           |
   | 5581365862985039198 | 104944290   | 1 | 0                           |
   +---------------------+-------------+---+-----------------------------+
   10 row(s) fetched.
   Elapsed 6.378 seconds.
   
   ```
   
   ### 1000 partitions, main
   ```shell
   andrewlamb@Andrews-MacBook-Pro-2:~/Downloads$ datafusion-cli -c "set 
datafusion.execution.target_partitions = 1000; SELECT \"WatchID\", 
\"ClientIP\", COUNT(*) AS c, SUM(\"IsRefresh\") FROM 'hits.parquet' GROUP BY 
\"WatchID\", \"ClientIP\" ORDER BY c DESC LIMIT 10;"
   DataFusion CLI v40.0.0
   0 row(s) fetched.
   Elapsed 0.002 seconds.
   
   +---------------------+-------------+---+-----------------------------+
   | WatchID             | ClientIP    | c | sum(hits.parquet.IsRefresh) |
   +---------------------+-------------+---+-----------------------------+
   | 7904046282518428963 | 1509330109  | 2 | 0                           |
   | 8566928176839891583 | -1402644643 | 2 | 0                           |
   | 6655575552203051303 | 1611957945  | 2 | 0                           |
   | 7224410078130478461 | -776509581  | 2 | 0                           |
   | 6780795588237729988 | 1894276368  | 1 | 1                           |
   | 6158430646513894356 | -1557291761 | 1 | 0                           |
   | 8433113762047612962 | 1214823432  | 1 | 0                           |
   | 8783130976633619349 | 1072197582  | 1 | 0                           |
   | 4959259883895284379 | 2023656393  | 1 | 0                           |
   | 6328586531975293675 | 1549952556  | 1 | 1                           |
   +---------------------+-------------+---+-----------------------------+
   10 row(s) fetched.
   Elapsed 7.771 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to