[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

Aaron Fabbri (JIRA) Tue, 04 Sep 2018 17:33:08 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603774#comment-16603774
 ]


Aaron Fabbri commented on HADOOP-15426:
---------------------------------------

Looking at v9 patch.  Integration tests passed in us-west2.  Overall I think 
this looks good.  I have one comment below that needs addressing (the double 
nested retry in getVersionMarkerItem())

We are almost to the point where the slogan of S3A can be Never Give Up*

*(until your retry policy has been exhausted)

Pretty cool.

{noformat}
-  private RetryPolicy dataAccessRetryPolicy;
+  /**
+   * This policy is purely for batched writes, not for processing
+   * exceptions in invoke() calls.
+   */
+  private RetryPolicy batchWriteRetryPolicy;
{noformat}

Later on in the DDBMS code:
{noformat}
+  Item getVersionMarkerItem() throws IOException {
     final PrimaryKey versionMarkerKey =
         createVersionMarkerPrimaryKey(VERSION_MARKER);
     int retryCount = 0;
-    Item versionMarker = table.getItem(versionMarkerKey);
+    Item versionMarker = queryVersionMarker(versionMarkerKey);
     while (versionMarker == null) {
       try {
-        RetryPolicy.RetryAction action = 
dataAccessRetryPolicy.shouldRetry(null,
+        RetryPolicy.RetryAction action = 
batchWriteRetryPolicy.shouldRetry(null,
             retryCount, 0, true);
         if (action.action == RetryPolicy.RetryAction.RetryDecision.FAIL) {
           break;
@@ -1085,11 +1183,26 @@ private Item getVersionMarkerItem() throws IOException {
         throw new IOException("initTable: Unexpected exception", e);
       }
       retryCount++;
-      versionMarker = table.getItem(versionMarkerKey);
+      versionMarker = queryVersionMarker(versionMarkerKey);
     }
     return versionMarker;
   }
 
+  /**
+   * Issue the query to get the version marker, with throttling for overloaded
+   * DDB tables.
+   * @param versionMarkerKey key to look up
+   * @return the marker
+   * @throws IOException failure
+   */
+  @Retries.RetryTranslated
+  private Item queryVersionMarker(final PrimaryKey versionMarkerKey)
+      throws IOException {
{noformat}

Two thoughts:
1. Do you want to change the comment where batchWriteRetryPolicy is declared?
2. Is the nested / double retry logic in getVersionMarkerItem() necessary?  Why 
not just use queryVersionMarker() directly, now that it has a retry wrapper?

Also, minor tidiness nit:  do you want to move init of 
DynamoDBMetadataStore#invoker to initDataAccessRetries as well?  It is getting 
a little confusing with the multiple invokers spread about. Might also want to 
add a comment to invoker saying it is the "utility" invoker for other 
operations like table provisioning and deletion?

{noformat}
+The DynamoDB costs come from the number of entries stores and the allocated 
capacity.
{noformat}
/stores/stored/

Thanks for adding to the docs, looks good.


> Make S3guard client resilient to DDB throttle events and network failures
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-15426
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15426
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Blocker
>         Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, 
> HADOOP-15426-003.patch, HADOOP-15426-004.patch, HADOOP-15426-005.patch, 
> HADOOP-15426-006.patch, HADOOP-15426-007.patch, HADOOP-15426-008.patch, 
> HADOOP-15426-009.patch, Screen Shot 2018-07-24 at 15.16.46.png, Screen Shot 
> 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 at 16.28.53.png, Screen 
> Shot 2018-07-27 at 14.07.38.png, 
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>       at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

Reply via email to