[GitHub] [arrow-datafusion] yjshen commented on issue #1637: DiskManager Performs Blocking IO

GitBox Fri, 22 Apr 2022 00:33:09 -0700


yjshen commented on issue #1637:
URL: 
https://github.com/apache/arrow-datafusion/issues/1637#issuecomment-1106108822


   This could be closed according to the discussions in #2226 Google Docs?
   
   > @tustvold  I'm suggesting just doing the sync disk IO in the same 
dedicated threadpool used for all query computation. In my benchmarks, I've not 
seen a compelling advantage to offloading the IO elsewhere as the loss of 
thread-locality hurts performance, the complexity cost is high, and frankly 
most realistic workloads are not close to being IO bound
   > In the case of spilling to disk, one could make a convincing case that 
you're actually memory bound and doing something else concurrently would be 
actively detrimental...
   
   > @houqp I see, that's certainly an option. I don't have a good intuition on 
whether this will a more optimal setup or not without some hands on benchmarks. 
My understanding is that to make this work well, we will need to create 
slightly more threads in the CPU thread pool to reduce core idleness caused by 
IO at the cost of slightly more context switches resulted from preemptive 
scheduling.
   > I also don't like the complexity of async io, so it's good idea to start 
with something simple, then benchmark and iterate from there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on issue #1637: DiskManager Performs Blocking IO

Reply via email to