Chris Nauroth updated HADOOP-13727:
    Attachment: HADOOP-13727-branch-2.001.patch

I'm attaching patch 001.  We have confirmed through load testing in our EC2 
environment that this patch prevents the throttling problems we had seen.  I 
also have completed a full S3A test run against US-west-2.

* Define {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
{{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
single instance.
* Change credential provider creation logic in {{S3AUtils}} to support use of 
the shared instance, both in the default case and the case that the user has 
configured {{fs.s3a.aws.credentials.provider}}.
* Also change the logic of {{S3AUtils}} for more edge case validation, better 
error messages and better readability (I hope).
* Update site documentation and core-default.xml to describe the new provider.
* Set up a new unit test suite, {{TestS3AAWSCredentialsProvider}}.  There were 
multiple tests from {{ITestS3AAWSCredentialsProvider}} that didn't really need 
full S3 integration, so I've moved them to the new unit test suite.  Now 
they'll run in pre-commit.  I also added new tests for the new functionality 
and new validation logic.

As of AWS SDK 1.11.39, the SDK code internally enforces a singleton.  After 
Hadoop upgrades to that version or higher, it's likely that we can remove this 

Also, I have proposed a change to the FileSystem cache logic in HADOOP-13726 
that would have prevented this from surfacing.  That's going to be a much 
riskier change, so I'd still like to proceed with the S3A change here.

> S3A: Reduce high number of connections to EC2 Instance Metadata Service 
> caused by InstanceProfileCredentialsProvider.
> ---------------------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-13727
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13727
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Rajesh Balamohan
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13727-branch-2.001.patch
> When running in an EC2 VM, S3A can make use of 
> {{InstanceProfileCredentialsProvider}} from the AWS SDK to obtain credentials 
> from the EC2 Instance Metadata Service.  We have observed that for a highly 
> multi-threaded application, this may generate a high number of calls to the 
> Instance Metadata Service.  The service may throttle the client by replying 
> with an HTTP 429 response or forcibly closing connections.  We can greatly 
> reduce the number of calls to the service by enforcing that all threads use a 
> single shared instance of {{InstanceProfileCredentialsProvider}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to