EpsilonPrime commented on issue #15233:
URL: https://github.com/apache/arrow/issues/15233#issuecomment-1477386944

   I have written a reproduction testcase that detects the thread contention 
issue (and is ready to check in once the fix is ready).  What is happening is 
that when copying a file (filesystem.cc:613) the CopyStream happens as expected 
and then is passed to the close routine to complete.  That delegates to 
CloseAsync which handles uploading parts (calling UploadPart).  To do this 
UploadPart then adds its work to the threadpool which overloads the executor.  
For the case of an 8 thread pool with 8 tasks (each small enough to fit in a 
single part) this ends up being 16 busy threads in a size 8 executor.
   
   The easy solution is to limit the number of tasks to the pool (merely 
leaving one extra thread appears to be enough for the pool to empty although 
this needs verification).  The second is to modify the close routine to take 
over the work of the existing thread (not be asynchronous).  This would require 
reworking of at least 5 functions and might require even more work for the case 
where there are multiple parts per file (which we do not have a test for yet).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to