[
https://issues.apache.org/jira/browse/HADOOP-18927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773622#comment-17773622
]
Steve Loughran commented on HADOOP-18927:
-----------------------------------------
full v1 stack. I know this is v1, but v2 is where we are doing the retry logic
improvement...this is just one example to address
*if we map SocketException to the connectivity failure retry policy, is there
any fatal exception we will end up blocking on?*
{code}
Error: org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #51 upload ID
YzA0NjZlNTgtZmI3MC00MDMxLTg2ZTMtOWZhZjdjYmI3ZWFjLjJlNjFmNjEyLTU1ZmEtNGY1NS05OGU2LWQyOTRmMzgyZGU4NA
on benchmarks/TestDFSIO/io_data/test_io_100: com.amazonaws.SdkClientException:
Unable to execute HTTP request: Connection reset by peer: Unable to execute
HTTP request: Connection reset by peer at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:209) at
org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:121) at
org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:348) at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:440) at
org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:344) at
org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:319) at
org.apache.hadoop.fs.s3a.WriteOperationHelper.retry(WriteOperationHelper.java:207)
at
org.apache.hadoop.fs.s3a.WriteOperationHelper.uploadPart(WriteOperationHelper.java:640)
at
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.lambda$uploadBlockAsync$0(S3ABlockOutputStream.java:819)
at
org.apache.hadoop.util.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:219)
at
org.apache.hadoop.util.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:219)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833) Caused by:
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection
reset by peer at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1219)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1165)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456) at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403) at
com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3887)
at
com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3872)
at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:2813)
at
org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$uploadPart$10(WriteOperationHelper.java:645)
at
org.apache.hadoop.fs.store.audit.AuditingFunctions.lambda$withinAuditSpan$0(AuditingFunctions.java:62)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:119) ... 15 more Caused
by: java.net.SocketException: Connection reset by peer at
java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420) at
java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440) at
java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826) at
java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1035) at
com.amazonaws.thirdparty.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:124)
at
com.amazonaws.thirdparty.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:136)
at
com.amazonaws.thirdparty.apache.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:167)
at
com.amazonaws.thirdparty.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)
at
com.amazonaws.thirdparty.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:144)
at
com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:160)
at
com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
at
com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:152)
at
com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
at
com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
at
com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at
com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at
com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at
com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at
com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
... 30 more
{code}
> S3ARetryHandler to treat SocketExceptions as connectivity failures
> ------------------------------------------------------------------
>
> Key: HADOOP-18927
> URL: https://issues.apache.org/jira/browse/HADOOP-18927
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.6
> Reporter: Steve Loughran
> Priority: Major
>
> i've got a v1 sdk stack trace where a TCP connection reset is breaking a
> large upload. that should be recoverable with retries.
> {code}
> com.amazonaws.SdkClientException: Unable to execute HTTP request: Connection
> reset by peer: Unable to execute HTTP request: Connection reset by peer at...
> {code}
> proposed:
> * S3ARetryPolicy to map SocketException to connectivity failure
> * See if we can create a test for this, ideally under the aws sdk.
> I'm now unsure about how well we handle these io problems...a quick
> experiment with the 3.3.5 release shows that the retry policy retries on
> whatever exception chain has an unknown host for the endpoint.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]