Steve Loughran created HADOOP-19516: ---------------------------------------
Summary: S3A: SDK reads content twice during PUT to S3 Express store. Key: HADOOP-19516 URL: https://issues.apache.org/jira/browse/HADOOP-19516 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.4.1, 3.4.2 Environment: client in uK talking to s3 express ucket in us-w-2 Reporter: Steve Loughran During PUT calls, even of 0 byte objects, our UploadContentProver is reporting a recreation of of the input stream of an UploadContentProvider, as seen by our logging at info of this happening {code} bin/hadoop fs -touchz $v3/4 2025-03-26 13:38:53,377 [main] INFO impl.UploadContentProviders (UploadContentProviders.java:newStream(289)) - Stream recreated: FileWithOffsetContentProvider{file=/tmp/hadoop-stevel/s3a/s3ablock-0001-659277820991634509.tmp, offset=0} BaseContentProvider{size=0, initiated at 2025-03-26T13:38:53.355, streamCreationCount=2, currentStream=null} {code} This code was added in HADOOP-19221, S3A: Unable to recover from failure of multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"; it logs at INFO because it is considered both rare and serious enough that we should log it, based on our hypothesis that it was triggered by a transient failure of the S3 service front and and the inability of the SDK to recover from it It turns out that uploading even a zero byte file to S3 triggers the dual creation of the stream, apparently from a dual signing. This *does not* happen on multipart uploads. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org