peter-toth commented on issue #13692:
URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2577048321

   This is a very interresting issue.
   I was trying to repro the results of the above experiement, but got matching 
results with rayon and tokio. Maybe rayon is slightly faster on my M3:
   
   ```
   % cargo run --release --bin tokio -- --cpu-duration 1s --concurrency 7
       Finished `release` profile [optimized] target(s) in 0.46s
        Running `target/release/tokio --cpu-duration 1s --concurrency 7`
   Average duration of 1061 ms (IO 57 ms) over 1 samples, throughput 0.94205743 
rps
   Average duration of 1060 ms (IO 57 ms) over 7 samples, throughput 6.741966 
rps
   Average duration of 1039 ms (IO 35 ms) over 7 samples, throughput 6.7168765 
rps
   Average duration of 1041 ms (IO 37 ms) over 7 samples, throughput 6.7122035 
rps
   Average duration of 1039 ms (IO 37 ms) over 7 samples, throughput 6.7568874 
rps
   Average duration of 1039 ms (IO 36 ms) over 7 samples, throughput 6.7319846 
rps
   Average duration of 1037 ms (IO 34 ms) over 7 samples, throughput 6.7261376 
rps
   Average duration of 1042 ms (IO 38 ms) over 7 samples, throughput 6.720298 
rps
   ^C
   % cargo run --release --bin rayon -- --cpu-duration 1s --concurrency 7
       Finished `release` profile [optimized] target(s) in 1.89s
        Running `target/release/rayon --cpu-duration 1s --concurrency 7`
   Average duration of 1042 ms (IO 37 ms) over 1 samples, throughput 0.9594442 
rps
   Average duration of 1044 ms (IO 41 ms) over 7 samples, throughput 6.8305006 
rps
   Average duration of 1028 ms (IO 27 ms) over 7 samples, throughput 6.8629923 
rps
   Average duration of 1023 ms (IO 22 ms) over 7 samples, throughput 6.8896527 
rps
   Average duration of 1019 ms (IO 18 ms) over 7 samples, throughput 6.9374223 
rps
   Average duration of 1027 ms (IO 22 ms) over 7 samples, throughput 6.8580756 
rps
   Average duration of 1018 ms (IO 16 ms) over 7 samples, throughput 6.847353 
rps
   Average duration of 1018 ms (IO 16 ms) over 7 samples, throughput 6.8783445 
rps
   ^C
   ```
   Honestly, I don't get why would be significant difference between the 2 as 
both apps seem to work the same way.
   We have the main thread that spawns tasks and those tasks are excuted either 
on the 8 threads of tokio or the 7 threads of rayon (more on this later). In 
the rayon app the tokio worker theads don't do anything, do they? So I would 
explain the slight discrepancy with the different work stealing logic of tokio 
and rayon.
   
   Where I do see difference is `--concurrency 8`+:
   ```
   % cargo run --release --bin tokio -- --cpu-duration 1s --concurrency 8
       Finished `release` profile [optimized] target(s) in 0.25s
        Running `target/release/tokio --cpu-duration 1s --concurrency 8`
   Average duration of 1064 ms (IO 59 ms) over 1 samples, throughput 0.93955696 
rps
   Average duration of 1072 ms (IO 69 ms) over 8 samples, throughput 7.634239 
rps
   Average duration of 1045 ms (IO 43 ms) over 8 samples, throughput 7.6972647 
rps
   Average duration of 1040 ms (IO 37 ms) over 8 samples, throughput 7.66677 rps
   Average duration of 1039 ms (IO 36 ms) over 8 samples, throughput 7.762669 
rps
   Average duration of 1037 ms (IO 34 ms) over 8 samples, throughput 7.6279187 
rps
   Average duration of 1044 ms (IO 40 ms) over 8 samples, throughput 7.680044 
rps
   Average duration of 1042 ms (IO 39 ms) over 8 samples, throughput 7.695585 
rps
   ^C
   % cargo run --release --bin rayon -- --cpu-duration 1s --concurrency 8
       Finished `release` profile [optimized] target(s) in 1.48s
        Running `target/release/rayon --cpu-duration 1s --concurrency 8`
   Average duration of 1036 ms (IO 31 ms) over 1 samples, throughput 0.96456635 
rps
   Average duration of 1198 ms (IO 196 ms) over 7 samples, throughput 6.9807143 
rps
   Average duration of 1167 ms (IO 166 ms) over 7 samples, throughput 6.969797 
rps
   Average duration of 1158 ms (IO 156 ms) over 7 samples, throughput 6.9852505 
rps
   Average duration of 1145 ms (IO 143 ms) over 7 samples, throughput 6.9620543 
rps
   Average duration of 1150 ms (IO 148 ms) over 7 samples, throughput 6.962559 
rps
   Average duration of 1148 ms (IO 146 ms) over 7 samples, throughput 6.959907 
rps
   ^C
   ```
   But that's because the aforementioned `.use_current_thread()` initialization 
of rayon treadpool which causes the main thread to be part of the rayon pool, 
but work stealing is not initialized there. Removing that line makes the 2 
match again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to