steveniemitz opened a new issue, #22431:
URL: https://github.com/apache/beam/issues/22431

   ### What happened?
   
   Due to an idiosyncrasy of how the google API client batch API works, calls 
using the batch API do not use a pooled HTTP connection (more accurately, do 
not leave the connection in a state where it can be returned to the pool) and 
instead require a new connection each time.  This can lead to a significant 
number of sockets left in TIME_WAIT for operations that do a lot of getObject 
operations (match, etc), possibly even leading to socket/fd exhaustion.
   
   The interaction between the FileSystem and GcsUtil is such that getObjects 
is generally only called with a single element, so we can optimize here and 
direct that to the single-object API instead, which does pool correctly.
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-java-gcp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to