[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Fabbri updated HADOOP-13904: ---------------------------------- Attachment: HADOOP-13904-HADOOP-13345.002.patch Attaching v2 patch: Add exponential backoff timer to batched DynamoDB operations. Add scale tests for both MetadataStore implementations. Also, fix copy/paste typo in parallel test exclusions. Some output from testing pasted below.. You can compare with the output before I added the backoff timer: {noformat} DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 128 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 0 took 341 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 487 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 1 took 543 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 559 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 2 took 609 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 138 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 0 took 204 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 331 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 1 took 418 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 1124 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 2 took 1185 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 1158 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 3 took 1206 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 250 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 0 took 311 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 469 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 1 took 528 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 630 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 2 took 1829 msec DynamoDBMetadataStore.java:retryBackoff(511)) - Sleeping 886 msec before next retry DynamoDBMetadataStore.java:processBatchWriteRequest(481)) - Retry 3 took 931 msec {noformat} Tested via included scale test in US West 2. Also note, this test actually *can* be run in parallel with others since it uses its own (fake) bucket name. This patch is based on top of HADOOP-13876, so there may be jenkins failures without it. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > ---------------------------------------------------------------------------- > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: HADOOP-13345 > Reporter: Steve Loughran > Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org