vatanrathi commented on issue #25991:
URL: https://github.com/apache/beam/issues/25991#issuecomment-1493309066

   @aromanenko-dev Sorry If I was not clear before ... Let me explain
   
   Currently we are on beam 2.23.0 versions and given job finishes in around 
10min. I tried to upgrade to 2.45.0 and noticed performance issues on both aws 
sdk1 and 2. So, I thought of upgrading versions step by step and thats where I 
noticed that performance started degraded  from ver 2.31.0. Thats where I 
noticed this change which I believe is the root cause.
   
   Below is my final findings based on several iterations of tests.
   
   1. With aws sdk1, if I drainInputStream is removed from close() call, then 
execution time is same across versions.
   2. However with sdk2 , with drainInputStream call in close(), pipeline runs 
for hours which takes only ~10min to finish on aws sdk1. if drainInputStream is 
closed, performance is improved but it still it took ~30mins to finish. But if 
s3ResponseInputStream.abort() is called before s3ResponseInputStream.close() in 
close(), then performance is significantly imporved and pipeline finishes 
within 3minutes.
   
   ```
     @Override
     public void close() throws IOException {
       if (s3ResponseInputStream != null) {
         **s3ResponseInputStream.abort()**
         drainInputStream(s3ResponseInputStream);
         s3ResponseInputStream.close();
       }
       open = false;
     }
   
   ```
   I found a bug https://github.com/aws/aws-sdk-java-v2/issues/2117 raised in 
aws-sdk-java-v2 for close() call which also complains that close() call 
unexpectedly waits.
   
   For your question "I'm wondering if it's even possible that close() will be 
called under normal circumstances before all data is read?", I dont know the 
exact answer but I think as beam reads data in burst so when data read in first 
fetch is being processed, s3 try to close connection. 
   
   If you think we can avoid close() call by tweaking some http connection 
param in pipeline options or in some other way, kindly let me know 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to