steveniemitz opened a new issue, #22431: URL: https://github.com/apache/beam/issues/22431
### What happened? Due to an idiosyncrasy of how the google API client batch API works, calls using the batch API do not use a pooled HTTP connection (more accurately, do not leave the connection in a state where it can be returned to the pool) and instead require a new connection each time. This can lead to a significant number of sockets left in TIME_WAIT for operations that do a lot of getObject operations (match, etc), possibly even leading to socket/fd exhaustion. The interaction between the FileSystem and GcsUtil is such that getObjects is generally only called with a single element, so we can optimize here and direct that to the single-object API instead, which does pool correctly. ### Issue Priority Priority: 2 ### Issue Component Component: io-java-gcp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
