Re: [PR] Make serialization spawn_blocking in async runtime [arrow-datafusion]

via GitHub Wed, 10 Jan 2024 12:50:17 -0800


alamb commented on PR #8802:
URL: 
https://github.com/apache/arrow-datafusion/pull/8802#issuecomment-1885710317


   TLDR is that I don't think this benchmark keeps the number of cores constant 
   
   When I run the first benchmark run that is supposed to use 1 core:
   
   ```
   Benchmarking parameter group/sink_bs100_tp1/1: Collecting 100 samples in 
estimated 105.75 s (100 iterations)
   ...
   ```
   
   Main uses a single core as epxected
   ![Screenshot 2024-01-10 at 3 47 57 
PM](https://github.com/apache/arrow-datafusion/assets/490673/f9b1876f-7453-410d-8591-729a639a6aec)
   
   When I run on the `synnada-ai:upstream/spawn-blocking-for-se` branch, the 
first benchmark uses almost three cores: 
   ![Screenshot 2024-01-10 at 3 40 39 
PM](https://github.com/apache/arrow-datafusion/assets/490673/2f06c336-3390-4de2-8966-c98be333caf7)
   
   What I think is happening is that the `spawn_blocking` call, as one might 
expect, runs the workload on a dedicated thread that is not controlled by the 
tokio runtime's worker thread count
   
   ```rust
                       let rt = Builder::new_multi_thread()
                           .worker_threads(worker_thread)
                           .build()
                           .unwrap();
   ```
   
   The fact that the branch is using 3 threads and main can only use one I 
think accounts for the speed differences
   
   
   ## Methodology
   ```shell
   docker run -v "/Users/andrewlamb/Downloads/tpch_sf0.1":/data -it --rm 
ghcr.io/databloom-ai/tpch-docker:main -vf -s 0.1
   ```
   
   I put the benchmark code on this branch 
https://github.com/alamb/arrow-datafusion/tree/alamb/write_threads_bench
   
   And ran 
   ```shell
   cargo bench --bench write_threads
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Make serialization spawn_blocking in async runtime [arrow-datafusion]

Reply via email to