[ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564433#comment-16564433
 ] 

Steve Loughran commented on HADOOP-15426:
-----------------------------------------

See also: 
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html

States that the BatchGetItem/BatchWriteItem calls are really wrappers around a 
series (parallelised?) GET/POST operations, rather than anything done server 
side in some bulk operation. Which means that
* switching to things like a batch delete call may not save much, unless the 
calls are parallelised.
* the IO load of a batch write of 10 items is 10 *1 item.
* Partial failures will happen regularly under an overloaded store: if each API 
call has an probability of X chance of failing, then the chance of a batch 
operation on Y items not failing is (1-X)^Y...except of course, throttle 
failures are not independent events, are they. So in fact, a batch request of a 
large #of items against a table with a small IO capacity will inevitably 
trigger the throttling itself. 

h2. Large batch operations against a table with a small allocated IO capacity 
will inevitably trigger throttling and hence partial failures will be the 
normal outcome, not an outlier event.

Which means that its critical that the events get reported to the FS, and that 
we don't go overboard in reporting throttling events. They will happen, we need 
to recover from them  everywhere, and we don't want to fill the logs telling 
people this is happening

> Make S3guard client resilient to DDB throttle events and network failures
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-15426
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15426
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Blocker
>         Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, 
> HADOOP-15426-003.patch, Screen Shot 2018-07-24 at 15.16.46.png, Screen Shot 
> 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 at 16.28.53.png, Screen 
> Shot 2018-07-27 at 14.07.38.png, 
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>       at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to