[jira] [Commented] (HADOOP-19221) S3A: Unable to recover from failure of multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"

ASF GitHub Bot (Jira) Mon, 29 Jul 2024 10:34:42 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869413#comment-17869413
 ]


ASF GitHub Bot commented on HADOOP-19221:
-----------------------------------------

steveloughran commented on PR #6938:
URL: https://github.com/apache/hadoop/pull/6938#issuecomment-2256520565

   * updated PR has the new field; do need to document it though.
   * pulled out fault injector class for reuse
   
   based on @shameersss1 comments I've reviewed S3ABlockOutputStream aborting
   
   * use our own FutureIO to wait for results; this unwraps exceptions
     for us
   * On InterruptedIOException, upload is aborted but no attempt made to cancel
     the requests (things are being interrupted, after all).
   * Atomic bool stopFutureUploads is used to signal to future uploads that they
     should stop uploading but still clean up data.
   * when the awaiting for future IO operation is interrupted, no attempt
     is made to cancel/interrupt the uploads, but that flag is still set.
     
   Now unsure about what is the best policy to avoid ever leaking buffers
   if an upload is cancelled.
   
   1. Should we ever use future.cancel()? or just set stopFutureUploads knowing 
the uploads are skipped
   2. would we want the upload stream to somehow trigger a failure which gets 
through the SDK (i.e. no retries?) and then exits?
    
   We could do this now we have our own content provider: raise a 
nonrecoverable AwsClientException...




> S3A: Unable to recover from failure of multipart block upload attempt "Status 
> Code: 400; Error Code: RequestTimeout"
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19221
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19221
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> If a multipart PUT request fails for some reason (e.g. networrk error) then 
> all subsequent retry attempts fail with a 400 Response and ErrorCode 
> RequestTimeout .
> {code}
> Your socket connection to the server was not read from or written to within 
> the timeout period. Idle connections will be closed. (Service: Amazon S3; 
> Status Code: 400; Error Code: RequestTimeout; Request ID:; S3 Extended 
> Request ID:
> {code}
> The list of supporessed exceptions contains the root cause (the initial 
> failure was a 500); all retries failed to upload properly from the source 
> input stream {{RequestBody.fromInputStream(fileStream, size)}}.
> Hypothesis: the mark/reset stuff doesn't work for input streams. On the v1 
> sdk we would build a multipart block upload request passing in (file, offset, 
> length), the way we are now doing this doesn't recover.
> probably fixable by providing our own {{ContentStreamProvider}} 
> implementations for
> # file + offset + length
> # bytebuffer
> # byte array
> The sdk does have explicit support for the memory ones, but they copy the 
> data blocks first. we don't want that as it would double the memory 
> requirements of active blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19221) S3A: Unable to recover from failure of multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"

Reply via email to