[GitHub] [beam] vatanrathi commented on issue #25991: [Bug]: BEAM-12329 causes performance issues

via GitHub Mon, 03 Apr 2023 03:27:04 -0700


vatanrathi commented on issue #25991:
URL: https://github.com/apache/beam/issues/25991#issuecomment-1494070150


   @mosche Thanks a lot for agreeing to look into it.. It was causing a lot of 
trouble for us.
   
   I would also need your comment on connection reuse.  We have very large 
files (some are more than 200GBs) to process.
   I have setup my httpClientConfig as below
   
   ```
   options.setHttpClientConfiguraton(HttpClientConfiguraton.builder()
   .connectionTimeout(1000 * 60 * 60  * 10) // 10 hours
   .socketTimeout((1000 * 60 * 60  * 10) // 10 hours
   .connectionMaxIdleTime((1000  * 10) // 10 seconds
   .build());
   ```
   
   This is to ensure that we DO NOT REUSE conn from pool that have been idle 
for more than 10sec since s3 closes idle conn after 20 sec which could result 
in using an already closed conn. This idle timeout matters as BEAM process data 
in bursts.
   
   So, I (think) connection is closed every 10 secs which invoke close() call.  
Do you think my above config is fine for this use case ?
   
   At this stage, I have upgraded to BEAM 2.45.0 with spark3 and aws skd2.  I 
have put a patch on aws skd2 by including abort() call within close() function 
which is giving me best performance in my SIT environment. Do you think I 
should be able to take it to prod until a proper fix/workaround is implemented 
in BEAM sdk ?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] vatanrathi commented on issue #25991: [Bug]: BEAM-12329 causes performance issues

Reply via email to