[jira] [Work logged] (HADOOP-18310) Add option and make 400 bad request retryable

ASF GitHub Bot (Jira) Fri, 24 Jun 2022 04:05:08 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-18310?focusedWorklogId=784536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-784536
 ]


ASF GitHub Bot logged work on HADOOP-18310:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Jun/22 11:04
            Start Date: 24/Jun/22 11:04
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on code in PR #4483:
URL: https://github.com/apache/hadoop/pull/4483#discussion_r905947970


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1203,4 +1203,18 @@ private Constants() {
    * Default maximum read size in bytes during vectored reads : {@value}.
    */
   public static final int DEFAULT_AWS_S3_VECTOR_READS_MAX_MERGED_READ_SIZE = 
1253376; //1M
+
+  /**
+   * Flag for immediate failure when observing a {@link 
AWSBadRequestException}.
+   * If it's disabled and set to false, the failure is treated as retryable.
+   * Value {@value}.
+   */
+  public static final String FAIL_ON_AWS_BAD_REQUEST = 
"fs.s3a.fail.on.aws.bad.request";

Review Comment:
   I now think "fs.s3a.retry.on.400.response.enabled" would be better, with 
default flipped. docs would say "experimental"
   
   and assuming we do have a custom policy, adjacent 
   
   ```
   fs.s3a.retry.on.400.response.delay  // delay between attempts, default "10s"
   fs.s3a.retry.on.400.response.attempts // number of attempts, default 6
   ```
   





Issue Time Tracking
-------------------

    Worklog Id:     (was: 784536)
    Time Spent: 2h 20m  (was: 2h 10m)

> Add option and make 400 bad request retryable
> ---------------------------------------------
>
>                 Key: HADOOP-18310
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18310
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.3.4
>            Reporter: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When one is using a customized credential provider via 
> fs.s3a.aws.credentials.provider, e.g. 
> org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider, when the provided 
> credential by this pluggable provider is expired and return an error code of 
> 400 as bad request exception.
> Here, the current S3ARetryPolicy will fail immediately and does not retry on 
> the S3A level. 
> Our recent use case in HBase found this use case could lead to a Region 
> Server got immediate abandoned from this Exception without retry, when the 
> file system is trying open or S3AInputStream is trying to reopen the file. 
> especially the S3AInputStream use cases, we cannot find a good way to retry 
> outside of the file system semantic (because if a ongoing stream is failing 
> currently it's considered as irreparable state), and thus we come up with 
> this optional flag for retrying in S3A.
> {code}
> Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The provided 
> token has expired. (Service: Amazon S3; Status Code: 400; Error Code: 
> ExpiredToken; Request ID: XYZ; S3 Extended Request ID: ABC; Proxy: null), S3 
> Extended Request ID: 123
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
>       at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
>       at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5453)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5400)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1524)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem$InputStreamCallbacksImpl.getObject(S3AFileSystem.java:1506)
>       at 
> org.apache.hadoop.fs.s3a.S3AInputStream.lambda$reopen$0(S3AInputStream.java:217)
>       at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>       ... 35 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HADOOP-18310) Add option and make 400 bad request retryable

Reply via email to