[jira] [Updated] (HADOOP-19516) S3A: SDK reads content twice during PUT to S3 Express store.

Steve Loughran (Jira) Sat, 05 Apr 2025 09:58:17 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-19516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-19516:
------------------------------------
    Description: 
The commit on trunk {{60c7d4fea010}} and on branch 3.4 {{f3ec55b}} fixes the 
logging, but does not
address the underlying issue.

h2. problem

During PUT calls, even of 0 byte objects, our UploadContentProver is reporting 
a recreation of 
of the input stream of an UploadContentProvider, as seen by our logging at info 
of this happening

{code}
 bin/hadoop fs -touchz $v3/4
2025-03-26 13:38:53,377 [main] INFO  impl.UploadContentProviders 
(UploadContentProviders.java:newStream(289)) - Stream recreated: 
FileWithOffsetContentProvider{file=/tmp/hadoop-stevel/s3a/s3ablock-0001-659277820991634509.tmp,
 offset=0} BaseContentProvider{size=0, initiated at 2025-03-26T13:38:53.355, 
streamCreationCount=2, currentStream=null}
{code}

This code was added in HADOOP-19221, S3A: Unable to recover from failure of 
multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"; 
it logs at INFO because it is considered both rare and serious enough that we 
should log it, based on our hypothesis that it was triggered by a transient 
failure of the S3 service front and and the inability of the SDK to recover 
from it

It turns out that uploading even a zero byte file to S3 triggers the dual 
creation of the stream, apparently from a dual signing.

This *does not* happen on multipart uploads.

  was:
During PUT calls, even of 0 byte objects, our UploadContentProver is reporting 
a recreation of 
of the input stream of an UploadContentProvider, as seen by our logging at info 
of this happening

{code}
 bin/hadoop fs -touchz $v3/4
2025-03-26 13:38:53,377 [main] INFO  impl.UploadContentProviders 
(UploadContentProviders.java:newStream(289)) - Stream recreated: 
FileWithOffsetContentProvider{file=/tmp/hadoop-stevel/s3a/s3ablock-0001-659277820991634509.tmp,
 offset=0} BaseContentProvider{size=0, initiated at 2025-03-26T13:38:53.355, 
streamCreationCount=2, currentStream=null}
{code}

This code was added in HADOOP-19221, S3A: Unable to recover from failure of 
multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"; 
it logs at INFO because it is considered both rare and serious enough that we 
should log it, based on our hypothesis that it was triggered by a transient 
failure of the S3 service front and and the inability of the SDK to recover 
from it

It turns out that uploading even a zero byte file to S3 triggers the dual 
creation of the stream, apparently from a dual signing.

This *does not* happen on multipart uploads.


> S3A: SDK reads content twice during PUT to S3 Express store.
> ------------------------------------------------------------
>
>                 Key: HADOOP-19516
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19516
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.4.1, 3.4.2
>         Environment: client in uK talking to s3 express ucket in us-w-2
>            Reporter: Steve Loughran
>            Priority: Major
>             Fix For: 3.5.0, 3.4.2
>
>
> The commit on trunk {{60c7d4fea010}} and on branch 3.4 {{f3ec55b}} fixes the 
> logging, but does not
> address the underlying issue.
> h2. problem
> During PUT calls, even of 0 byte objects, our UploadContentProver is 
> reporting a recreation of 
> of the input stream of an UploadContentProvider, as seen by our logging at 
> info of this happening
> {code}
>  bin/hadoop fs -touchz $v3/4
> 2025-03-26 13:38:53,377 [main] INFO  impl.UploadContentProviders 
> (UploadContentProviders.java:newStream(289)) - Stream recreated: 
> FileWithOffsetContentProvider{file=/tmp/hadoop-stevel/s3a/s3ablock-0001-659277820991634509.tmp,
>  offset=0} BaseContentProvider{size=0, initiated at 2025-03-26T13:38:53.355, 
> streamCreationCount=2, currentStream=null}
> {code}
> This code was added in HADOOP-19221, S3A: Unable to recover from failure of 
> multipart block upload attempt "Status Code: 400; Error Code: 
> RequestTimeout"; it logs at INFO because it is considered both rare and 
> serious enough that we should log it, based on our hypothesis that it was 
> triggered by a transient failure of the S3 service front and and the 
> inability of the SDK to recover from it
> It turns out that uploading even a zero byte file to S3 triggers the dual 
> creation of the stream, apparently from a dual signing.
> This *does not* happen on multipart uploads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19516) S3A: SDK reads content twice during PUT to S3 Express store.

Reply via email to