[ https://issues.apache.org/jira/browse/HADOOP-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-19171: ------------------------------------ Description: We've had reports of network connection failures surfacing deeper in the stack where we don't convert to AWSApiCallTimeoutException so they aren't retried properly (retire connection and repeat) {code} Unable to execute HTTP request: Broken pipe (Write failed) {code} {code} Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout {code} note, this is v1 sdk but the 400 error is treated as fail-fast in all our versions and I don't think we do the same for the broken pipe. that one is going to be trickier to handle as unless that is coming from the http/tls libraries "broken pipe" may not be in the newer builds. We'd have to look for the string in the SDKs to see what causes it and go from there was: We've had reports of network connection failures surfacing deeper in the stack where we don't convert to AWSApiCallTimeoutException so they aren't retried properly (retire connection and repeat) {code} Unable to execute HTTP request: Broken pipe (Write failed) {code} {code} Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout {code} note, this is v1 sdk but the 400 error is treated as fail-fast in all our versoins > S3A: handle alternative forms of connection failure > --------------------------------------------------- > > Key: HADOOP-19171 > URL: https://issues.apache.org/jira/browse/HADOOP-19171 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.4.0, 3.3.6 > Reporter: Steve Loughran > Priority: Major > > We've had reports of network connection failures surfacing deeper in the > stack where we don't convert to AWSApiCallTimeoutException so they aren't > retried properly (retire connection and repeat) > {code} > Unable to execute HTTP request: Broken pipe (Write failed) > {code} > {code} > Your socket connection to the server was not read from or written to within > the timeout period. Idle connections will be closed. (Service: Amazon S3; > Status Code: 400; Error Code: RequestTimeout > {code} > note, this is v1 sdk but the 400 error is treated as fail-fast in all our > versions and I don't think we do the same for the broken pipe. that one is > going to be trickier to handle as unless that is coming from the http/tls > libraries "broken pipe" may not be in the newer builds. We'd have to look for > the string in the SDKs to see what causes it and go from there -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org