vatanrathi commented on issue #25991: URL: https://github.com/apache/beam/issues/25991#issuecomment-1494070150
@mosche Thanks a lot for agreeing to look into it.. It was causing a lot of trouble for us. I would also need your comment on connection reuse. We have very large files (some are more than 200GBs) to process. I have setup my httpClientConfig as below ``` options.setHttpClientConfiguraton(HttpClientConfiguraton.builder() .connectionTimeout(1000 * 60 * 60 * 10) // 10 hours .socketTimeout((1000 * 60 * 60 * 10) // 10 hours .connectionMaxIdleTime((1000 * 10) // 10 seconds .build()); ``` This is to ensure that we DO NOT REUSE conn from pool that have been idle for more than 10sec since s3 closes idle conn after 20 sec which could result in using an already closed conn. This idle timeout matters as BEAM process data in bursts. So, I (think) connection is closed every 10 secs which invoke close() call. Do you think my above config is fine for this use case ? At this stage, I have upgraded to BEAM 2.45.0 with spark3 and aws skd2. I have put a patch on aws skd2 by including abort() call within close() function which is giving me best performance in my SIT environment. Do you think I should be able to take it to prod until a proper fix/workaround is implemented in BEAM sdk ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
