[ 
https://issues.apache.org/jira/browse/HADOOP-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014225#comment-16014225
 ] 

Steve Loughran commented on HADOOP-14303:
-----------------------------------------

Transient failure, seen on the s3guard+committer branch, but without s3guard 
turned on.

A 400 is trouble as it has so many meanings. The good news: the usual 
unrecoverable failures (auth &c) will show up in FS.initialize, so if
we don't retry there (or only retry a couple of times), then maybe everywhere 
else a retry strategy will work


{code}
Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir
Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.903 sec <<< 
FAILURE! - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir
testRecursiveRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 6.394 sec  <<< ERROR!
org.apache.hadoop.fs.s3a.AWSS3IOException: getFileStatus on user/stevel/: 
com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon 
S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
A67536F45A9683CE), S3 Extended Request ID: 
fkUz/wPcebNi4Mp5fAwVWRw/BEPv/2fmn74+1bEqCft/yhp3xMfcQSYI7O56YF1YZ7NfDLUTzmw=: 
Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; 
Request ID: A67536F45A9683CE)
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:179)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1932)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1874)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1836)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1661)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1637)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.treeWalk(ContractTestUtils.java:1211)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.treeWalk(ContractTestUtils.java:1218)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.treeWalk(ContractTestUtils.java:1218)
        at 
org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRecursiveRootListing(AbstractContractRootDirectoryTest.java:221)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request 
(Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
A67536F45A9683CE)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1586)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1254)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4185)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4132)
        at 
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1245)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1130)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1915)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1874)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1836)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1661)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1637)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.treeWalk(ContractTestUtils.java:1211)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.treeWalk(ContractTestUtils.java:1218)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.treeWalk(ContractTestUtils.java:1218)
        at 
org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRecursiveRootListing(AbstractContractRootDirectoryTest.java:221)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}


> Review retry logic on all S3 SDK calls, implement where needed
> --------------------------------------------------------------
>
>                 Key: HADOOP-14303
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14303
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>
> AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle 
> this without failing, as if it slows down its requests it can recover.
> 1. Look at all the places where we are calling S3A via the AWS SDK and make 
> sure we are retrying with some backoff & jitter policy, ideally something 
> unified. This must be more systematic than the case-by-case, 
> problem-by-problem strategy we are implicitly using.
> 2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), 
> but we need to check the other parts of the process: login, initiate/complete 
> MPU, ...
> Related
> HADOOP-13811 Failed to sanitize XML document destined for handler class
> HADOOP-13664 S3AInputStream to use a retry policy on read failures
> This stuff is all hard to test. A key need is to be able to differentiate 
> recoverable throttle & network failures from unrecoverable problems like: 
> auth, network config (e.g bad endpoint), etc.
> May be the opportunity to add a faulting subclass of Amazon S3 client which 
> can be configured in IT Tests to fail at specific points. Ryan Blue's mcok S3 
> client does this in HADOOP-13786, but it is for 100% mock. I'm thinking of 
> something with similar fault raising, but in front of the real S3A client 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to