jorgecarleitao opened a new pull request #8453:
URL: https://github.com/apache/arrow/pull/8453


   Currently, `mergeExec` uses `tokio::spawn` to parallelize the work, by 
calling `tokio::spawn` once per logical thread. However, `tokio::spawn` returns 
a task / future, which `tokio` runtime will then schedule on its thread pool.
   
   Therefore, there is no need to limit the number of tasks to the number of 
logical threads, as tokio's runtime itself is responsible for that work. In 
particular, since we are using 
[`rt-threaded`](https://docs.rs/tokio/0.2.22/tokio/runtime/index.html#threaded-scheduler),
 tokio already declares a thread pool from the number of logical threads 
available.
   
   This PR removes the coupling, in `mergeExec`, between the number of logical 
threads (`max_concurrency`) and the number of created tasks. I observe no 
change in performance:
   
   <details>
    <summary>Benchmark results</summary>
   
   ```
   Switched to branch 'simplify_merge'
   Your branch is up to date with 'origin/simplify_merge'.
      Compiling datafusion v2.0.0-SNAPSHOT 
(/Users/jorgecarleitao/projects/arrow/rust/datafusion)
       Finished bench [optimized] target(s) in 38.02s
        Running 
/Users/jorgecarleitao/projects/arrow/rust/target/release/deps/aggregate_query_sql-5241a705a1ff29ae
   Gnuplot not found, using plotters backend
   aggregate_query_no_group_by 15 12                                            
                                
                           time:   [715.17 us 722.60 us 730.19 us]
                           change: [-8.3167% -5.2253% -2.2675%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   aggregate_query_group_by 15 12                                               
                             
                           time:   [5.6538 ms 5.6695 ms 5.6892 ms]
                           change: [+0.1012% +0.5308% +0.9913%] (p = 0.02 < 
0.05)
                           Change within noise threshold.
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) high mild
     6 (6.00%) high severe
   
   aggregate_query_group_by_with_filter 15 12                                   
                                          
                           time:   [2.6598 ms 2.6665 ms 2.6751 ms]
                           change: [-0.5532% -0.1446% +0.2679%] (p = 0.51 > 
0.05)
                           No change in performance detected.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) high mild
     4 (4.00%) high severe
   ```
   
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to