[ https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560356#comment-16560356 ]
Steve Loughran commented on HADOOP-15426: ----------------------------------------- HADOOP-15426 Patch 003. Ready for review & use Testing * DDB tables aren't resized any more, we just rely on read <= 10 and write <= 15 (writes being more expensive and all). * Test more of the metastore operations by making some of the inner ops package protected; this guarantees that higher level ops which make multiple DDB Calls are having all their invocations wrapped. * Final test in squence verifies that the statistics of the FS are updated too. * test timeout for tests in {{ITestS3AFileSystemContract}} set to {{S3ATestConstants.S3A_TEST_TIMEOUT}}, as is done elsewhere (otherwise, intermittent timeout in the read=6/write=6 test runs) Production code * ability to bind to an S3A filesystem factored out into package private method so we can attach the FS to the new metastore instance for testing statistic propagation * review/update the @Retry tags. * make sure there's no double-wrapping of retries: the retry routines should be as close to the DDB invocations as possible. * Generate a mock AWS SDK exception for throttling, so I can wrap AWSServiceThrottedException around it, and have a meaningful exception message. Although metrics aren't wired up, we don't get any retry stats from in the SDK. Looked at it, decided it was hard work. Testing: S3 US-west-1 with capacity read=6, write=6, everything worked (eventually! ), excluding the assumed role tests which failed with permissions; that's fixed by HADOOP-15883 > Make S3guard client resilient to DDB throttle events and network failures > ------------------------------------------------------------------------- > > Key: HADOOP-15426 > URL: https://issues.apache.org/jira/browse/HADOOP-15426 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.1.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Blocker > Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, > HADOOP-15426-003.patch, Screen Shot 2018-07-24 at 15.16.46.png, Screen Shot > 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 at 16.28.53.png, Screen > Shot 2018-07-27 at 14.07.38.png, > org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt > > > managed to create on a parallel test run > {code} > org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on > s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: > com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: > The level of configured provisioned throughput for the table was exceeded. > Consider increasing your provisioning level with the UpdateTable API. > (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: > ProvisionedThroughputExceededException; Request ID: > RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of > configured provisioned throughput for the table was exceeded. Consider > increasing your provisioning level with the UpdateTable API. (Service: > AmazonDynamoDBv2; Status Code: 400; Error Code: > ProvisionedThroughputExceededException; Request ID: > RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG) > at > {code} > We should be able to handle this. 400 "bad things happened" error though, not > the 503 from S3. > h3. We need a retry handler for DDB throttle operations -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org