[ 
https://issues.apache.org/jira/browse/HADOOP-18927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773622#comment-17773622
 ] 

Steve Loughran commented on HADOOP-18927:
-----------------------------------------

full v1 stack. I know this is v1, but v2 is where we are doing the retry logic 
improvement...this is just one example to address

*if we map SocketException to the connectivity failure retry policy, is there 
any fatal exception we will end up blocking on?*

{code}
Error: org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #51 upload ID 
YzA0NjZlNTgtZmI3MC00MDMxLTg2ZTMtOWZhZjdjYmI3ZWFjLjJlNjFmNjEyLTU1ZmEtNGY1NS05OGU2LWQyOTRmMzgyZGU4NA
 on benchmarks/TestDFSIO/io_data/test_io_100: com.amazonaws.SdkClientException: 
Unable to execute HTTP request: Connection reset by peer: Unable to execute 
HTTP request: Connection reset by peer at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:209) at 
org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:121) at 
org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:348) at 
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:440) at 
org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:344) at 
org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:319) at 
org.apache.hadoop.fs.s3a.WriteOperationHelper.retry(WriteOperationHelper.java:207)
 at 
org.apache.hadoop.fs.s3a.WriteOperationHelper.uploadPart(WriteOperationHelper.java:640)
 at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.lambda$uploadBlockAsync$0(S3ABlockOutputStream.java:819)
 at 
org.apache.hadoop.util.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:219)
 at 
org.apache.hadoop.util.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:219)
 at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
 at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
 at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 at java.base/java.lang.Thread.run(Thread.java:833) Caused by: 
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection 
reset by peer at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1219)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1165)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
 at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456) at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403) at 
com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3887) 
at 
com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3872) 
at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:2813) 
at 
org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$uploadPart$10(WriteOperationHelper.java:645)
 at 
org.apache.hadoop.fs.store.audit.AuditingFunctions.lambda$withinAuditSpan$0(AuditingFunctions.java:62)
 at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:119) ... 15 more Caused 
by: java.net.SocketException: Connection reset by peer at 
java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420) at 
java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440) at 
java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826) at 
java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035) at 
com.amazonaws.thirdparty.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:124)
 at 
com.amazonaws.thirdparty.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:136)
 at 
com.amazonaws.thirdparty.apache.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:167)
 at 
com.amazonaws.thirdparty.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)
 at 
com.amazonaws.thirdparty.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:144)
 at 
com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:160)
 at 
com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
 at 
com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:152)
 at 
com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
 at 
com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
 at 
com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 at 
com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
 at 
com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
 at 
com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
 at 
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
 at 
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
 at 
com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
 ... 30 more

{code}


> S3ARetryHandler to treat SocketExceptions as connectivity failures
> ------------------------------------------------------------------
>
>                 Key: HADOOP-18927
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18927
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.6
>            Reporter: Steve Loughran
>            Priority: Major
>
> i've got a v1 sdk stack trace where a TCP connection reset is breaking a 
> large upload. that should be recoverable with retries.
> {code}
> com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection 
> reset by peer: Unable to execute HTTP request: Connection reset by peer at...
> {code}
> proposed:
> * S3ARetryPolicy to map SocketException to connectivity failure
> * See if we can create a test for this, ideally under the aws sdk.
> I'm now unsure about how well we handle these io problems...a quick 
> experiment with the 3.3.5 release shows that the retry policy retries on 
> whatever exception chain has an unknown host for the endpoint. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to