steveloughran commented on code in PR #4483:
URL: https://github.com/apache/hadoop/pull/4483#discussion_r905945310


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java:
##########
@@ -214,7 +214,10 @@ protected Map<Class<? extends Exception>, RetryPolicy> 
createExceptionMap() {
 
     // policy on a 400/bad request still ambiguous.
     // Treated as an immediate failure
-    policyMap.put(AWSBadRequestException.class, fail);
+    RetryPolicy awsBadRequestExceptionRetryPolicy =
+        configuration.getBoolean(FAIL_ON_AWS_BAD_REQUEST, 
DEFAULT_FAIL_ON_AWS_BAD_REQUEST) ?
+            fail : retryIdempotentCalls;

Review Comment:
   1. should retry on all calls, rather than just idempotent ones, as long as 
we are confident that the request is never executed before the failure
   2. I don't believe the normal exponential backoff strategy is the right one, 
as the initial delays are very short lived (500ms), whereas if you are hoping 
that credential providers will fetch new credentials, an initial delay of a few 
seconds would seem better. I wouldn't even bother with exponential growth here, 
just say 6 times at 10 seconds.
   
   I think we would also want to log at warn that this is happening, assuming 
this is rare. 



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1203,4 +1203,18 @@ private Constants() {
    * Default maximum read size in bytes during vectored reads : {@value}.
    */
   public static final int DEFAULT_AWS_S3_VECTOR_READS_MAX_MERGED_READ_SIZE = 
1253376; //1M
+
+  /**
+   * Flag for immediate failure when observing a {@link 
AWSBadRequestException}.
+   * If it's disabled and set to false, the failure is treated as retryable.
+   * Value {@value}.
+   */
+  public static final String FAIL_ON_AWS_BAD_REQUEST = 
"fs.s3a.fail.on.aws.bad.request";

Review Comment:
   I now think "fs.s3a.retry.on.400.response.enabled" would be better, with 
default flipped. docs would say "experimental"
   
   and assuming we do have a custom policy, adjacent 
   
   ```
   fs.s3a.retry.on.400.response.delay  // delay between attempts, default "10s"
   fs.s3a.retry.on.400.response.attempts // number of attempts, default 6
   ```
   
   fs.s3a.retry.on.400



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestInvoker.java:
##########
@@ -311,12 +311,25 @@ public void testRetryAWSConnectivity() throws Throwable {
    */
   @Test(expected = AWSBadRequestException.class)
   public void testRetryBadRequestNotIdempotent() throws Throwable {
-    invoker.retry("test", null, false,
+
+    invoker.retry("test", null, true,
         () -> {
           throw BAD_REQUEST;
         });
   }
 
+  @Test
+  public void testRetryBadRequestIdempotent() throws Throwable {

Review Comment:
   test looks ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to