[ https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564433#comment-16564433 ]
Steve Loughran commented on HADOOP-15426: ----------------------------------------- See also: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html States that the BatchGetItem/BatchWriteItem calls are really wrappers around a series (parallelised?) GET/POST operations, rather than anything done server side in some bulk operation. Which means that * switching to things like a batch delete call may not save much, unless the calls are parallelised. * the IO load of a batch write of 10 items is 10 *1 item. * Partial failures will happen regularly under an overloaded store: if each API call has an probability of X chance of failing, then the chance of a batch operation on Y items not failing is (1-X)^Y...except of course, throttle failures are not independent events, are they. So in fact, a batch request of a large #of items against a table with a small IO capacity will inevitably trigger the throttling itself. h2. Large batch operations against a table with a small allocated IO capacity will inevitably trigger throttling and hence partial failures will be the normal outcome, not an outlier event. Which means that its critical that the events get reported to the FS, and that we don't go overboard in reporting throttling events. They will happen, we need to recover from them everywhere, and we don't want to fill the logs telling people this is happening > Make S3guard client resilient to DDB throttle events and network failures > ------------------------------------------------------------------------- > > Key: HADOOP-15426 > URL: https://issues.apache.org/jira/browse/HADOOP-15426 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.1.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Blocker > Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, > HADOOP-15426-003.patch, Screen Shot 2018-07-24 at 15.16.46.png, Screen Shot > 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 at 16.28.53.png, Screen > Shot 2018-07-27 at 14.07.38.png, > org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt > > > managed to create on a parallel test run > {code} > org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on > s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: > com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: > The level of configured provisioned throughput for the table was exceeded. > Consider increasing your provisioning level with the UpdateTable API. > (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: > ProvisionedThroughputExceededException; Request ID: > RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of > configured provisioned throughput for the table was exceeded. Consider > increasing your provisioning level with the UpdateTable API. (Service: > AmazonDynamoDBv2; Status Code: 400; Error Code: > ProvisionedThroughputExceededException; Request ID: > RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG) > at > {code} > We should be able to handle this. 400 "bad things happened" error though, not > the 503 from S3. > h3. We need a retry handler for DDB throttle operations -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org