[ https://issues.apache.org/jira/browse/HADOOP-19516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-19516: ------------------------------------ Description: The commit on trunk {{60c7d4fea010}} and on branch 3.4 {{f3ec55b}} fixes the logging, but does not address the underlying issue. h2. problem During PUT calls, even of 0 byte objects, our UploadContentProver is reporting a recreation of of the input stream of an UploadContentProvider, as seen by our logging at info of this happening {code} bin/hadoop fs -touchz $v3/4 2025-03-26 13:38:53,377 [main] INFO impl.UploadContentProviders (UploadContentProviders.java:newStream(289)) - Stream recreated: FileWithOffsetContentProvider{file=/tmp/hadoop-stevel/s3a/s3ablock-0001-659277820991634509.tmp, offset=0} BaseContentProvider{size=0, initiated at 2025-03-26T13:38:53.355, streamCreationCount=2, currentStream=null} {code} This code was added in HADOOP-19221, S3A: Unable to recover from failure of multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"; it logs at INFO because it is considered both rare and serious enough that we should log it, based on our hypothesis that it was triggered by a transient failure of the S3 service front and and the inability of the SDK to recover from it It turns out that uploading even a zero byte file to S3 triggers the dual creation of the stream, apparently from a dual signing. This *does not* happen on multipart uploads. was: During PUT calls, even of 0 byte objects, our UploadContentProver is reporting a recreation of of the input stream of an UploadContentProvider, as seen by our logging at info of this happening {code} bin/hadoop fs -touchz $v3/4 2025-03-26 13:38:53,377 [main] INFO impl.UploadContentProviders (UploadContentProviders.java:newStream(289)) - Stream recreated: FileWithOffsetContentProvider{file=/tmp/hadoop-stevel/s3a/s3ablock-0001-659277820991634509.tmp, offset=0} BaseContentProvider{size=0, initiated at 2025-03-26T13:38:53.355, streamCreationCount=2, currentStream=null} {code} This code was added in HADOOP-19221, S3A: Unable to recover from failure of multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"; it logs at INFO because it is considered both rare and serious enough that we should log it, based on our hypothesis that it was triggered by a transient failure of the S3 service front and and the inability of the SDK to recover from it It turns out that uploading even a zero byte file to S3 triggers the dual creation of the stream, apparently from a dual signing. This *does not* happen on multipart uploads. > S3A: SDK reads content twice during PUT to S3 Express store. > ------------------------------------------------------------ > > Key: HADOOP-19516 > URL: https://issues.apache.org/jira/browse/HADOOP-19516 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 3.4.1, 3.4.2 > Environment: client in uK talking to s3 express ucket in us-w-2 > Reporter: Steve Loughran > Priority: Major > Fix For: 3.5.0, 3.4.2 > > > The commit on trunk {{60c7d4fea010}} and on branch 3.4 {{f3ec55b}} fixes the > logging, but does not > address the underlying issue. > h2. problem > During PUT calls, even of 0 byte objects, our UploadContentProver is > reporting a recreation of > of the input stream of an UploadContentProvider, as seen by our logging at > info of this happening > {code} > bin/hadoop fs -touchz $v3/4 > 2025-03-26 13:38:53,377 [main] INFO impl.UploadContentProviders > (UploadContentProviders.java:newStream(289)) - Stream recreated: > FileWithOffsetContentProvider{file=/tmp/hadoop-stevel/s3a/s3ablock-0001-659277820991634509.tmp, > offset=0} BaseContentProvider{size=0, initiated at 2025-03-26T13:38:53.355, > streamCreationCount=2, currentStream=null} > {code} > This code was added in HADOOP-19221, S3A: Unable to recover from failure of > multipart block upload attempt "Status Code: 400; Error Code: > RequestTimeout"; it logs at INFO because it is considered both rare and > serious enough that we should log it, based on our hypothesis that it was > triggered by a transient failure of the S3 service front and and the > inability of the SDK to recover from it > It turns out that uploading even a zero byte file to S3 triggers the dual > creation of the stream, apparently from a dual signing. > This *does not* happen on multipart uploads. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org