[
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron Fabbri updated HADOOP-13904:
----------------------------------
Attachment: HADOOP-13904-HADOOP-13345.001.patch
Attaching v1 patch. It adds new scale tests for DynamoDBMetadataStore and
LocalMetadataStore. I think we should get HADOOP-13589 in first, and I am
happy to rebase this when that is committed.
The included change to the docs (s3guard.md) describes a configuration I used
to reliably trigger DynamoDB throttling. I was able to observe both a
significant slowdown in the test execution, as well WriteThrottle events in my
AWS CloudWatch UI.
I also added some instrumentation around our use of DynamoDB's batched write
API, as the docs imply that we need to add our own backoff timers there. The
output looks like this:
{quote}
2017-01-18 15:21:15,930 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took
10 retries to complete
2017-01-18 15:21:18,447 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took
7 retries to complete
2017-01-18 15:21:20,987 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took
6 retries to complete
2017-01-18 15:21:23,530 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took
6 retries to complete
2017-01-18 15:21:25,975 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took
9 retries to complete
2017-01-18 15:21:28,561 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took
6 retries to complete
2017-01-18 15:21:31,037 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took
8 retries to complete
2017-01-18 15:21:33,407 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took
5 retries to complete
2017-01-18 15:21:35,685 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took
6 retries to complete
{quote}
Next I will dig into the AWS SDK source and/or put timing around the retry
calls to `batchWriteItemUnprocessed()` to see if (A) the SDK is doing
exponential backoff for us, or (B) we need to add a sleep timer in that retry
loop.
> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> ----------------------------------------------------------------------------
>
> Key: HADOOP-13904
> URL: https://issues.apache.org/jira/browse/HADOOP-13904
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Steve Loughran
> Assignee: Aaron Fabbri
> Attachments: HADOOP-13904-HADOOP-13345.001.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as
> documented by
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table
> create/delete operations and in get/put actions, recognise the error codes
> and retry using an appropriate retry policy (exponential backoff + ultimate
> failure)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]