tustvold commented on PR #4040:
URL: https://github.com/apache/arrow-rs/pull/4040#issuecomment-1724082933

   Looping back to this I think this problem is ill-formed. There are two major 
use-cases for this functionality:
   
   1. Supporting object_store on threads not managed by tokio
   2. Support object_store in systems containing multiple thread pools with 
different tail latencies
   
   The first use-case is better served by integrating tokio at a higher level, 
e.g. using `Handle::enter` at the thread level.
   
   It is unclear how to handle the second use-case at a library level. The use 
of a second threadpool implies that the primary threadpool may have very high 
tail latencies. The problem is determining at what point this should result in 
back pressure on the underlying TCP connection. As written this PR will not 
change the way that this backpressure occurs, should the task not get scheduled 
on the high tail latency threadpool, nothing will drain the TCP socket, and TCP 
backpressure will occur. The approach in 
https://github.com/apache/arrow-rs/pull/4015 instead uses a queue with capacity 
for a single chunk, which will delay this TCP backpressure very slightly. You 
could increase the queue size, or make a more sophisticated queue that buffers 
a given number of bytes, but you're it is unclear how this should be controlled.
   
   Taking a step back this feels like the wrong way to solve this problem, 
ultimately IO should be segregated from compute at a meaningful application 
task boundary, rather than at the object_store interface. For example, 
AsyncFileReader::get_bytes could perform the IO to fetch a given chunk of data 
on a separate thread pool. This avoids object_store having to make decisions 
about how much buffering is too much, etc...
   
   I am therefore going to close this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to