Rachelint commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2853367261

   @jayzhan211 @alamb I guess I nearly understand why
   
   It is related to `target_partitions`, we maintain the `intermediate results` 
respectively in each `partition`.
   
   So when `target_partitions` is larger, amount of `intermediate results` in 
one `partition` is smaller.
   
   And when `intermediate results` assigned to `partition` is small enough, the 
blocked approach will almost make no difference (assume the extreme case, the 
`Vec` used to store results never resize).
   
   Here is my benchmark result in a production machine (x86_64 + 16core + 
3699.178CPU MHz + 64G RAM) with different `target_partitions`
   
   - target_partitions = 4 (amazing improvement!)
   ```
   // blocked approach
   Q7: SELECT "WatchID", MIN("ResolutionWidth") as wmin, MAX("ResolutionWidth") 
as wmax, SUM("IsRefresh") as srefresh FROM hits GROUP BY "WatchID" ORDER BY 
"WatchID" DESC LIMIT 10;
   Query 7 iteration 0 took 4996.7 ms and returned 10 rows
   Query 7 iteration 1 took 5177.6 ms and returned 10 rows
   Query 7 iteration 2 took 5335.4 ms and returned 10 rows
   Query 7 iteration 3 took 5232.7 ms and returned 10 rows
   Query 7 iteration 4 took 5350.5 ms and returned 10 rows
   Query 7 avg time: 5218.59 ms
   
   // main
   Q7: SELECT "WatchID", MIN("ResolutionWidth") as wmin, MAX("ResolutionWidth") 
as wmax, SUM("IsRefresh") as srefresh FROM hits GROUP BY "WatchID" ORDER BY 
"WatchID" DESC LIMIT 10;
   Query 7 iteration 0 took 8484.9 ms and returned 10 rows
   Query 7 iteration 1 took 8194.4 ms and returned 10 rows
   Query 7 iteration 2 took 8317.9 ms and returned 10 rows
   Query 7 iteration 3 took 8183.2 ms and returned 10 rows
   Query 7 iteration 4 took 8225.7 ms and returned 10 rows
   Query 7 avg time: 8281.19 ms
   ```
   
   - target_partitions = 8 (emmm... some improvement)
   ```
   // blocked approach
   Q7: SELECT "WatchID", MIN("ResolutionWidth") as wmin, MAX("ResolutionWidth") 
as wmax, SUM("IsRefresh") as srefresh FROM hits GROUP BY "WatchID" ORDER BY 
"WatchID" DESC LIMIT 10;
   Query 7 iteration 0 took 2599.2 ms and returned 10 rows
   Query 7 iteration 1 took 2857.9 ms and returned 10 rows
   Query 7 iteration 2 took 3046.3 ms and returned 10 rows
   Query 7 iteration 3 took 2766.0 ms and returned 10 rows
   Query 7 iteration 4 took 2830.8 ms and returned 10 rows
   Query 7 avg time: 2820.04 ms
   
   // main
   Q7: SELECT "WatchID", MIN("ResolutionWidth") as wmin, MAX("ResolutionWidth") 
as wmax, SUM("IsRefresh") as srefresh FROM hits GROUP BY "WatchID" ORDER BY 
"WatchID" DESC LIMIT 10;
   Query 7 iteration 0 took 3820.0 ms and returned 10 rows
   Query 7 iteration 1 took 3716.6 ms and returned 10 rows
   Query 7 iteration 2 took 3728.3 ms and returned 10 rows
   Query 7 iteration 3 took 3628.2 ms and returned 10 rows
   Query 7 iteration 4 took 3912.6 ms and returned 10 rows
   Query 7 avg time: 3761.15 ms
   ```
   
   - target_partitions = 32 (sadly, almost no improvement...)
   ```
   // blocked approach
   Q7: SELECT "WatchID", MIN("ResolutionWidth") as wmin, MAX("ResolutionWidth") 
as wmax, SUM("IsRefresh") as srefresh FROM hits GROUP BY "WatchID" ORDER BY 
"WatchID" DESC LIMIT 10;
   Query 7 iteration 0 took 1383.7 ms and returned 10 rows
   Query 7 iteration 1 took 1274.3 ms and returned 10 rows
   Query 7 iteration 2 took 1321.7 ms and returned 10 rows
   Query 7 iteration 3 took 1308.1 ms and returned 10 rows
   Query 7 iteration 4 took 1310.6 ms and returned 10 rows
   Query 7 avg time: 1319.68 ms
   
   // main
   Q7: SELECT "WatchID", MIN("ResolutionWidth") as wmin, MAX("ResolutionWidth") 
as wmax, SUM("IsRefresh") as srefresh FROM hits GROUP BY "WatchID" ORDER BY 
"WatchID" DESC LIMIT 10;
   Query 7 iteration 0 took 1440.7 ms and returned 10 rows
   Query 7 iteration 1 took 1430.3 ms and returned 10 rows
   Query 7 iteration 2 took 1357.5 ms and returned 10 rows
   Query 7 iteration 3 took 1352.6 ms and returned 10 rows
   Query 7 iteration 4 took 1342.2 ms and returned 10 rows
   Query 7 avg time: 1384.64 ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to