Steve Loughran created HADOOP-19516:
---------------------------------------

             Summary: S3A: SDK reads content twice during PUT to S3 Express 
store.
                 Key: HADOOP-19516
                 URL: https://issues.apache.org/jira/browse/HADOOP-19516
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
    Affects Versions: 3.4.1, 3.4.2
         Environment: client in uK talking to s3 express ucket in us-w-2
            Reporter: Steve Loughran


During PUT calls, even of 0 byte objects, our UploadContentProver is reporting 
a recreation of 
of the input stream of an UploadContentProvider, as seen by our logging at info 
of this happening

{code}
 bin/hadoop fs -touchz $v3/4
2025-03-26 13:38:53,377 [main] INFO  impl.UploadContentProviders 
(UploadContentProviders.java:newStream(289)) - Stream recreated: 
FileWithOffsetContentProvider{file=/tmp/hadoop-stevel/s3a/s3ablock-0001-659277820991634509.tmp,
 offset=0} BaseContentProvider{size=0, initiated at 2025-03-26T13:38:53.355, 
streamCreationCount=2, currentStream=null}
{code}

This code was added in HADOOP-19221, S3A: Unable to recover from failure of 
multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"; 
it logs at INFO because it is considered both rare and serious enough that we 
should log it, based on our hypothesis that it was triggered by a transient 
failure of the S3 service front and and the inability of the SDK to recover 
from it

It turns out that uploading even a zero byte file to S3 triggers the dual 
creation of the stream, apparently from a dual signing.

This *does not* happen on multipart uploads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to