[GitHub] [arrow-datafusion] allenma commented on pull request #5939: Count distinct support multiple expressions

via GitHub Mon, 10 Apr 2023 21:53:28 -0700


allenma commented on PR #5939:
URL: 
https://github.com/apache/arrow-datafusion/pull/5939#issuecomment-1502685853


   @Dandandan @ozankabak , I did the benchmark with the new implementation, 
actually there is little performance downgrade:
   ```
   Benchmarking aggregate_query_no_group_by_count_distinct_wide: Warming up for 
3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 61.0s, or reduce sample count to 10.
   aggregate_query_no_group_by_count_distinct_wide
                           time:   [587.01 ms 598.43 ms 611.68 ms]
                           change: [-6.8593% -3.2992% +0.1757%] (p = 0.08 > 
0.05)
                           No change in performance detected.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   Benchmarking aggregate_query_no_group_by_count_distinct_narrow: Warming up 
for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 40.1s, or reduce sample count to 10.
   aggregate_query_no_group_by_count_distinct_narrow
                           time:   [399.48 ms 415.63 ms 438.63 ms]
                           change: [-1.1234% +3.7277% +10.592%] (p = 0.20 > 
0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
     2 (2.00%) high mild
     3 (3.00%) high severe
   ```
   I increase the test array size from 65536 to 134_217_728 to reduce the env 
noise, and the benchmark command is:
    cargo bench --bench aggregate_query_sql
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] allenma commented on pull request #5939: Count distinct support multiple expressions

Reply via email to