tustvold commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2543174655
I've pushed a simple example to [io_stall](https://github.com/tustvold/io_stall/blob/main/src/rayon.rs) that glues together [rayon](https://docs.rs/rayon/latest/rayon/) and [async_task](https://docs.rs/async-task/latest/async_task/) to yield an async scheduler that is able to accommodate CPU bound tasks whilst not starving IO. ``` cargo run --release --bin tokio -- --cpu-duration 1s --concurrency 7 Finished `release` profile [optimized] target(s) in 0.05s Running `target/release/tokio --cpu-duration 1s --concurrency 7` Average duration of 1002 ms (IO 2 ms) over 1 samples, throughput 0.9975502 rps Average duration of 2002 ms (IO 1002 ms) over 1 samples, throughput 0.9996063 rps Average duration of 3002 ms (IO 2002 ms) over 1 samples, throughput 0.9998498 rps Average duration of 3000 ms (IO 2000 ms) over 1 samples, throughput 0.999554 rps Average duration of 5003 ms (IO 4003 ms) over 1 samples, throughput 0.9995086 rps Average duration of 6003 ms (IO 5003 ms) over 1 samples, throughput 0.9998254 rps Average duration of 4001 ms (IO 3001 ms) over 1 samples, throughput 0.99941516 rps ``` vs ``` cargo run --release --bin rayon -- --cpu-duration 1s --concurrency 7 Compiling io_stall v0.1.0 (/home/raphael/repos/scratch/io_stall) Finished `release` profile [optimized] target(s) in 0.45s Running `target/release/rayon --cpu-duration 1s --concurrency 7` Average duration of 1002 ms (IO 2 ms) over 1 samples, throughput 0.9976903 rps Average duration of 1002 ms (IO 2 ms) over 7 samples, throughput 6.994286 rps Average duration of 1000 ms (IO 0 ms) over 7 samples, throughput 6.9929976 rps Average duration of 1000 ms (IO 0 ms) over 7 samples, throughput 6.9927454 rps Average duration of 1000 ms (IO 0 ms) over 7 samples, throughput 6.994525 rps Average duration of 1000 ms (IO 0 ms) over 7 samples, throughput 6.993674 rps Average duration of 1000 ms (IO 0 ms) over 7 samples, throughput 6.993697 rps Average duration of 1000 ms (IO 0 ms) over 7 samples, throughput 6.9924793 rps ``` I am sure there are ways to improve this, but I think this has some quite interesting properties, in particular: * Mostly a drop-in replacement for tokio::spawn * Less than 100 lines of code to maintain * Avoids IO starvation * Allows using rayon's very ergonomic parallelism options * Preserves the thread-locality originating from the way non-blocking operators are recursively "composed" However, it is important to highlight that with this approach IO will still degrade poorly once CPU resources are saturated. Where there is a clear IO boundary, e.g. AsyncFileReader::get_bytes, it may still be worthwhile to spawn that as a dedicated task so that it can run to completion without needing to "find time" on the rayon pool. However, this can be done as an optimisation if people run into such issues. Ultimately this fixes the major issue where concurrency nose dives long before CPU resources are saturated, with limited shenanigans. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org