[ 
https://issues.apache.org/jira/browse/HADOOP-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19171:
------------------------------------
    Description: 
We've had reports of network connection failures surfacing deeper in the stack 
where we don't convert to AWSApiCallTimeoutException so they aren't retried 
properly (retire connection and repeat)


{code}
Unable to execute HTTP request: Broken pipe (Write failed)
{code}


{code}
 Your socket connection to the server was not read from or written to within 
the timeout period. Idle connections will be closed. (Service: Amazon S3; 
Status Code: 400; Error Code: RequestTimeout
{code}

note, this is v1 sdk but the 400 error is treated as fail-fast in all our 
versions and I don't think we do the same for the broken pipe. that one is 
going to be trickier to handle as unless that is coming from the http/tls 
libraries "broken pipe" may not be in the newer builds. We'd have to look for 
the string in the SDKs to see what causes it and go from there



  was:
We've had reports of network connection failures surfacing deeper in the stack 
where we don't convert to AWSApiCallTimeoutException so they aren't retried 
properly (retire connection and repeat)


{code}
Unable to execute HTTP request: Broken pipe (Write failed)
{code}


{code}
 Your socket connection to the server was not read from or written to within 
the timeout period. Idle connections will be closed. (Service: Amazon S3; 
Status Code: 400; Error Code: RequestTimeout
{code}

note, this is v1 sdk but the 400 error is treated as fail-fast in all our 
versoins




> S3A: handle alternative forms of connection failure
> ---------------------------------------------------
>
>                 Key: HADOOP-19171
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19171
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0, 3.3.6
>            Reporter: Steve Loughran
>            Priority: Major
>
> We've had reports of network connection failures surfacing deeper in the 
> stack where we don't convert to AWSApiCallTimeoutException so they aren't 
> retried properly (retire connection and repeat)
> {code}
> Unable to execute HTTP request: Broken pipe (Write failed)
> {code}
> {code}
>  Your socket connection to the server was not read from or written to within 
> the timeout period. Idle connections will be closed. (Service: Amazon S3; 
> Status Code: 400; Error Code: RequestTimeout
> {code}
> note, this is v1 sdk but the 400 error is treated as fail-fast in all our 
> versions and I don't think we do the same for the broken pipe. that one is 
> going to be trickier to handle as unless that is coming from the http/tls 
> libraries "broken pipe" may not be in the newer builds. We'd have to look for 
> the string in the SDKs to see what causes it and go from there



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to