vatanrathi commented on issue #25991:
URL: https://github.com/apache/beam/issues/25991#issuecomment-1491229726

   @iemejia You might be correct in saying that there could be an underlying 
issue with amazon sdk.
    
   This is what I did so far:
    
   1. **beam-sdks-java-io-amazon-web-services** - I tried putting patch to 
remove "drainInputStream" call from close() and performance is same across all 
latest versions. But, then returns previous aws warning about "Not all bytes 
read"
    
   2. **beam-sdks-java-io-amazon-web-services2** - Putting same patch to ignore 
draining resulted in improved performance but still lot worse than sdk1 ... I 
noticed there seems to be an issue with closing of ResponseInputStream which 
appears to be waiting for a long time. Based on a sample test it took around 
6mins to close, so I added a "abort()" call before close/drain and to my 
surprise it result significantly improved performance which I would expect from 
latest beam + spark3
    
   _Below logs suggest that program waited ~**6min** for closing ResponseStream_
   **21:27:23**  dtime="2023-03-30 21:27:15.978", 
thread="idle-connection-reaper", lvl="DEBUG", 
logger="software.amazon.awssdk.http.apache.internal.net.SdkSslSocket", 
ctx="debug", jobId="xxxxx", executionId="xxxxx", closing 
[xxxxx.s3.ap-southeast-2.amazonaws.com/52.95.131.46:443](http://xxxxx.s3.ap-southeast-2.amazonaws.com/52.95.131.46:443)
   **21:33:44**  dtime="2023-03-30 21:33:33.406", thread="Executor task launch 
worker for task 4.0 in stage 0.0 (TID 4)", lvl="INFO", 
logger="org.apache.spark.storage.memory.MemoryStore", ctx="logInfo", jobId="", 
executionId="", Block rdd_8_4 stored as values in memory (estimated size 67.4 
MiB, free 15.8 GiB)
    
   After adding "abort" call before draining 
(https://github.com/apache/beam/blob/master/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ReadableSeekableByteChannel.java#L168)
 on sdk2, I did not observe any wait ...
   However, I am not sure If adding an "abort" call would cause any issue to my 
program or is it a bad choice


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to