[
https://issues.apache.org/jira/browse/HADOOP-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated HADOOP-13727:
-----------------------------------
Attachment: HADOOP-13727-branch-2.001.patch
I'm attaching patch 001. We have confirmed through load testing in our EC2
environment that this patch prevents the throttling problems we had seen. I
also have completed a full S3A test run against US-west-2.
* Define {{SharedInstanceProfileCredentialsProvider}} as a subclass of
{{InstanceProfileCredentialsProvider}}, which enforces creation of only a
single instance.
* Change credential provider creation logic in {{S3AUtils}} to support use of
the shared instance, both in the default case and the case that the user has
configured {{fs.s3a.aws.credentials.provider}}.
* Also change the logic of {{S3AUtils}} for more edge case validation, better
error messages and better readability (I hope).
* Update site documentation and core-default.xml to describe the new provider.
* Set up a new unit test suite, {{TestS3AAWSCredentialsProvider}}. There were
multiple tests from {{ITestS3AAWSCredentialsProvider}} that didn't really need
full S3 integration, so I've moved them to the new unit test suite. Now
they'll run in pre-commit. I also added new tests for the new functionality
and new validation logic.
As of AWS SDK 1.11.39, the SDK code internally enforces a singleton. After
Hadoop upgrades to that version or higher, it's likely that we can remove this
code.
Also, I have proposed a change to the FileSystem cache logic in HADOOP-13726
that would have prevented this from surfacing. That's going to be a much
riskier change, so I'd still like to proceed with the S3A change here.
> S3A: Reduce high number of connections to EC2 Instance Metadata Service
> caused by InstanceProfileCredentialsProvider.
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-13727
> URL: https://issues.apache.org/jira/browse/HADOOP-13727
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Rajesh Balamohan
> Assignee: Chris Nauroth
> Attachments: HADOOP-13727-branch-2.001.patch
>
>
> When running in an EC2 VM, S3A can make use of
> {{InstanceProfileCredentialsProvider}} from the AWS SDK to obtain credentials
> from the EC2 Instance Metadata Service. We have observed that for a highly
> multi-threaded application, this may generate a high number of calls to the
> Instance Metadata Service. The service may throttle the client by replying
> with an HTTP 429 response or forcibly closing connections. We can greatly
> reduce the number of calls to the service by enforcing that all threads use a
> single shared instance of {{InstanceProfileCredentialsProvider}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]