[ https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603774#comment-16603774 ]
Aaron Fabbri commented on HADOOP-15426: --------------------------------------- Looking at v9 patch. Integration tests passed in us-west2. Overall I think this looks good. I have one comment below that needs addressing (the double nested retry in getVersionMarkerItem()) We are almost to the point where the slogan of S3A can be Never Give Up* *(until your retry policy has been exhausted) Pretty cool. {noformat} - private RetryPolicy dataAccessRetryPolicy; + /** + * This policy is purely for batched writes, not for processing + * exceptions in invoke() calls. + */ + private RetryPolicy batchWriteRetryPolicy; {noformat} Later on in the DDBMS code: {noformat} + Item getVersionMarkerItem() throws IOException { final PrimaryKey versionMarkerKey = createVersionMarkerPrimaryKey(VERSION_MARKER); int retryCount = 0; - Item versionMarker = table.getItem(versionMarkerKey); + Item versionMarker = queryVersionMarker(versionMarkerKey); while (versionMarker == null) { try { - RetryPolicy.RetryAction action = dataAccessRetryPolicy.shouldRetry(null, + RetryPolicy.RetryAction action = batchWriteRetryPolicy.shouldRetry(null, retryCount, 0, true); if (action.action == RetryPolicy.RetryAction.RetryDecision.FAIL) { break; @@ -1085,11 +1183,26 @@ private Item getVersionMarkerItem() throws IOException { throw new IOException("initTable: Unexpected exception", e); } retryCount++; - versionMarker = table.getItem(versionMarkerKey); + versionMarker = queryVersionMarker(versionMarkerKey); } return versionMarker; } + /** + * Issue the query to get the version marker, with throttling for overloaded + * DDB tables. + * @param versionMarkerKey key to look up + * @return the marker + * @throws IOException failure + */ + @Retries.RetryTranslated + private Item queryVersionMarker(final PrimaryKey versionMarkerKey) + throws IOException { {noformat} Two thoughts: 1. Do you want to change the comment where batchWriteRetryPolicy is declared? 2. Is the nested / double retry logic in getVersionMarkerItem() necessary? Why not just use queryVersionMarker() directly, now that it has a retry wrapper? Also, minor tidiness nit: do you want to move init of DynamoDBMetadataStore#invoker to initDataAccessRetries as well? It is getting a little confusing with the multiple invokers spread about. Might also want to add a comment to invoker saying it is the "utility" invoker for other operations like table provisioning and deletion? {noformat} +The DynamoDB costs come from the number of entries stores and the allocated capacity. {noformat} /stores/stored/ Thanks for adding to the docs, looks good. > Make S3guard client resilient to DDB throttle events and network failures > ------------------------------------------------------------------------- > > Key: HADOOP-15426 > URL: https://issues.apache.org/jira/browse/HADOOP-15426 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.1.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Blocker > Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, > HADOOP-15426-003.patch, HADOOP-15426-004.patch, HADOOP-15426-005.patch, > HADOOP-15426-006.patch, HADOOP-15426-007.patch, HADOOP-15426-008.patch, > HADOOP-15426-009.patch, Screen Shot 2018-07-24 at 15.16.46.png, Screen Shot > 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 at 16.28.53.png, Screen > Shot 2018-07-27 at 14.07.38.png, > org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt > > > managed to create on a parallel test run > {code} > org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on > s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: > com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: > The level of configured provisioned throughput for the table was exceeded. > Consider increasing your provisioning level with the UpdateTable API. > (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: > ProvisionedThroughputExceededException; Request ID: > RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of > configured provisioned throughput for the table was exceeded. Consider > increasing your provisioning level with the UpdateTable API. (Service: > AmazonDynamoDBv2; Status Code: 400; Error Code: > ProvisionedThroughputExceededException; Request ID: > RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG) > at > {code} > We should be able to handle this. 400 "bad things happened" error though, not > the 503 from S3. > h3. We need a retry handler for DDB throttle operations -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org