[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

Steve Loughran (JIRA) Fri, 27 Jul 2018 14:33:29 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560356#comment-16560356
 ]


Steve Loughran commented on HADOOP-15426:
-----------------------------------------

HADOOP-15426 Patch 003. Ready for review & use

Testing
* DDB tables aren't resized any more, we just rely on read <= 10 and write <= 
15 (writes being more expensive and all).
* Test more of the metastore operations by making some of the inner ops package 
protected; this guarantees that higher level ops which make multiple DDB Calls 
are having all their invocations wrapped.
* Final test in squence verifies that the statistics of the FS are updated too.
* test timeout for tests in {{ITestS3AFileSystemContract}} set to 
{{S3ATestConstants.S3A_TEST_TIMEOUT}}, as is done elsewhere (otherwise, 
intermittent timeout in the read=6/write=6 test runs)

Production code
* ability to bind to an S3A filesystem factored out into package private method 
so we can attach the FS to the new metastore instance for testing statistic 
propagation
* review/update the @Retry tags.
* make sure there's no double-wrapping of retries: the retry routines should be 
as close to the DDB invocations as possible.
* Generate a mock AWS SDK exception for throttling, so I can wrap 
AWSServiceThrottedException around it, and have a meaningful exception message.

Although metrics aren't wired up, we don't get any retry stats from in the SDK. 
Looked at it, decided it was hard work. 

Testing: S3 US-west-1 with capacity read=6, write=6, everything worked 
(eventually! ), excluding the assumed role tests which failed with permissions; 
that's fixed by HADOOP-15883

> Make S3guard client resilient to DDB throttle events and network failures
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-15426
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15426
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Blocker
>         Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, 
> HADOOP-15426-003.patch, Screen Shot 2018-07-24 at 15.16.46.png, Screen Shot 
> 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 at 16.28.53.png, Screen 
> Shot 2018-07-27 at 14.07.38.png, 
> org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale-output.txt
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>       at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

Reply via email to