[jira] [Updated] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

Steve Loughran (JIRA) Thu, 26 Jul 2018 18:54:08 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-15426:
------------------------------------
    Status: Patch Available  (was: Open)

Patch 002

* wrap all the ddb ops which I can break through running the integration tests 
in parallel.
* scale test thread count/op count seems sufficient to trigger failures, and 
with fast-bail-out not too slow. 
The scale tests aren't really verifying that the client recovers, more that 
failures go through our retry logic (so increment counters), so will recover if 
the settings are right. Running the whole hadoop-aws test suite in 12 threads 
is best for stressing the entire coverage, including things like FS setup/

Tested: S3 US west with s3guard. It's really, really slow. But apart from that 
test timeout: not failing (*) .I consider that a success

(*) Exception, assumed role stuff, as I don't have the HADOOP-15583 patch in 
here

> Make S3guard client resilient to DDB throttle events and network failures
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-15426
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15426
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-15426-001.patch, HADOOP-15426-002.patch, Screen 
> Shot 2018-07-24 at 15.16.46.png, Screen Shot 2018-07-25 at 16.22.10.png, 
> Screen Shot 2018-07-25 at 16.28.53.png
>
>
> managed to create on a parallel test run
> {code}
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on 
> s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: 
> com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException:
>  The level of configured provisioned throughput for the table was exceeded. 
> Consider increasing your provisioning level with the UpdateTable API. 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of 
> configured provisioned throughput for the table was exceeded. Consider 
> increasing your provisioning level with the UpdateTable API. (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ProvisionedThroughputExceededException; Request ID: 
> RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG)
>       at 
> {code}
> We should be able to handle this. 400 "bad things happened" error though, not 
> the 503 from S3.
> h3. We need a retry handler for DDB throttle operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-15426) Make S3guard client resilient to DDB throttle events and network failures

Reply via email to