[
https://issues.apache.org/jira/browse/HADOOP-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026111#comment-17026111
]
Steve Loughran commented on HADOOP-16823:
-----------------------------------------
between prepaid IO and load; ITestDynamoDBMetadataStoreScale is the example of
this.
special callout for test setup
{code}
[ERROR]
test_070_putDirMarker(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)
Time elapsed: 314.323 s <<< ERROR!
org.apache.hadoop.fs.s3a.AWSServiceThrottledException: getVersionMarkerItem on
../VERSION:
com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
The level of configured provisioned throughput for the table was exceeded.
Consider increasing your provisioning level with the UpdateTable API. (Service:
AmazonDynamoDBv2; Status Code: 400; Error Code:
ProvisionedThroughputExceededException; Request ID:
LUKART1RQBVKV0T7BPURUN95QVVV4KQNSO5AEMVJF66Q9ASUAAJG): The level of configured
provisioned throughput for the table was exceeded. Consider increasing your
provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status
Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID:
LUKART1RQBVKV0T7BPURUN95QVVV4KQNSO5AEMVJF66Q9ASUAAJG)
at
org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.createMetadataStore(ITestDynamoDBMetadataStoreScale.java:152)
at
org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.setup(ITestDynamoDBMetadataStoreScale.java:162)
Caused by:
com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
The level of configured provisioned throughput for the table was exceeded.
Consider increasing your provisioning level with the UpdateTable API. (Service:
AmazonDynamoDBv2; Status Code: 400; Error Code:
ProvisionedThroughputExceededException; Request ID:
LUKART1RQBVKV0T7BPURUN95QVVV4KQNSO5AEMVJF66Q9ASUAAJG)
at
org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.createMetadataStore(ITestDynamoDBMetadataStoreScale.java:152)
at
org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.setup(ITestDynamoDBMetadataStoreScale.java:162)
{code}
Note how long the retry count was there. We were backing off big but it still
failed on us.
Looking at the AWS metrics, part of the fun is that the way bursty traffic is
handled, you may get your capacity at the time of the initial load, but get
blocked after. That is: the throttling may not happen under load, but during
the next time a low-load API call is made.
Also, S3GuardTableAccess isn't retrying, and some code in tests and the
purge/dump table entry points go on to fail when throttling happens when
iterating through scans. Fix: you can ask a DDBMetastore to wrap your scan with
one bonded to its retry and metrics...plus use of this where appropriate.
ITestDynamoDBMetadataStoreScale is really slow; either the changes make it
worse, or its always been really slow and we haven't noticed as it was
happening during the (slow) parallel test runs. Proposed: we review it, look at
what we want to show and then see if we can make things fail faster
> Manage S3 Throttling exclusively in S3A client
> ----------------------------------------------
>
> Key: HADOOP-16823
> URL: https://issues.apache.org/jira/browse/HADOOP-16823
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2.1
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
>
> Currently AWS S3 throttling is initially handled in the AWS SDK, only
> reaching the S3 client code after it has given up.
> This means we don't always directly observe when throttling is taking place.
> Proposed:
> * disable throttling retries in the AWS client library
> * add a quantile for the S3 throttle events, as DDB has
> * isolate counters of s3 and DDB throttle events to classify issues better
> Because we are taking over the AWS retries, we will need to expand the
> initial delay en retries and the number of retries we should support before
> giving up.
> Also: should we log throttling events? It could be useful but there is a risk
> of logs overloading especially if many threads in the same process were
> triggering the problem.
> Proposed: log at debug.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]