peter-toth commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2577048321
This is a very interresting issue. I was trying to repro the results of the above experiement, but got matching results with rayon and tokio. Maybe rayon is slightly faster on my M3: ``` % cargo run --release --bin tokio -- --cpu-duration 1s --concurrency 7 Finished `release` profile [optimized] target(s) in 0.46s Running `target/release/tokio --cpu-duration 1s --concurrency 7` Average duration of 1061 ms (IO 57 ms) over 1 samples, throughput 0.94205743 rps Average duration of 1060 ms (IO 57 ms) over 7 samples, throughput 6.741966 rps Average duration of 1039 ms (IO 35 ms) over 7 samples, throughput 6.7168765 rps Average duration of 1041 ms (IO 37 ms) over 7 samples, throughput 6.7122035 rps Average duration of 1039 ms (IO 37 ms) over 7 samples, throughput 6.7568874 rps Average duration of 1039 ms (IO 36 ms) over 7 samples, throughput 6.7319846 rps Average duration of 1037 ms (IO 34 ms) over 7 samples, throughput 6.7261376 rps Average duration of 1042 ms (IO 38 ms) over 7 samples, throughput 6.720298 rps ^C % cargo run --release --bin rayon -- --cpu-duration 1s --concurrency 7 Finished `release` profile [optimized] target(s) in 1.89s Running `target/release/rayon --cpu-duration 1s --concurrency 7` Average duration of 1042 ms (IO 37 ms) over 1 samples, throughput 0.9594442 rps Average duration of 1044 ms (IO 41 ms) over 7 samples, throughput 6.8305006 rps Average duration of 1028 ms (IO 27 ms) over 7 samples, throughput 6.8629923 rps Average duration of 1023 ms (IO 22 ms) over 7 samples, throughput 6.8896527 rps Average duration of 1019 ms (IO 18 ms) over 7 samples, throughput 6.9374223 rps Average duration of 1027 ms (IO 22 ms) over 7 samples, throughput 6.8580756 rps Average duration of 1018 ms (IO 16 ms) over 7 samples, throughput 6.847353 rps Average duration of 1018 ms (IO 16 ms) over 7 samples, throughput 6.8783445 rps ^C ``` Honestly, I don't get why would be significant difference between the 2 as both apps seem to work the same way. We have the main thread that spawns tasks and those tasks are excuted either on the 8 threads of tokio or the 7 threads of rayon (more on this later). In the rayon app the tokio worker theads don't do anything, do they? So I would explain the slight discrepancy with the different work stealing logic of tokio and rayon. Where I do see difference is `--concurrency 8`+: ``` % cargo run --release --bin tokio -- --cpu-duration 1s --concurrency 8 Finished `release` profile [optimized] target(s) in 0.25s Running `target/release/tokio --cpu-duration 1s --concurrency 8` Average duration of 1064 ms (IO 59 ms) over 1 samples, throughput 0.93955696 rps Average duration of 1072 ms (IO 69 ms) over 8 samples, throughput 7.634239 rps Average duration of 1045 ms (IO 43 ms) over 8 samples, throughput 7.6972647 rps Average duration of 1040 ms (IO 37 ms) over 8 samples, throughput 7.66677 rps Average duration of 1039 ms (IO 36 ms) over 8 samples, throughput 7.762669 rps Average duration of 1037 ms (IO 34 ms) over 8 samples, throughput 7.6279187 rps Average duration of 1044 ms (IO 40 ms) over 8 samples, throughput 7.680044 rps Average duration of 1042 ms (IO 39 ms) over 8 samples, throughput 7.695585 rps ^C % cargo run --release --bin rayon -- --cpu-duration 1s --concurrency 8 Finished `release` profile [optimized] target(s) in 1.48s Running `target/release/rayon --cpu-duration 1s --concurrency 8` Average duration of 1036 ms (IO 31 ms) over 1 samples, throughput 0.96456635 rps Average duration of 1198 ms (IO 196 ms) over 7 samples, throughput 6.9807143 rps Average duration of 1167 ms (IO 166 ms) over 7 samples, throughput 6.969797 rps Average duration of 1158 ms (IO 156 ms) over 7 samples, throughput 6.9852505 rps Average duration of 1145 ms (IO 143 ms) over 7 samples, throughput 6.9620543 rps Average duration of 1150 ms (IO 148 ms) over 7 samples, throughput 6.962559 rps Average duration of 1148 ms (IO 146 ms) over 7 samples, throughput 6.959907 rps ^C ``` But that's because the aforementioned `.use_current_thread()` initialization of rayon treadpool which causes the main thread to be part of the rayon pool, but work stealing is not initialized there. Removing that line makes the 2 match again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org