[
https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558788#comment-16558788
]
Steve Loughran commented on HADOOP-14237:
-----------------------------------------
Revisiting this
* what was the use case which triggered this. You were trying to log in and
things were failing?
* And this was in EC2, so you were using the IAM role aut, which does an HTTP
Get?
* Can we have a stack trace?
We could address this by having our own subclass of the
{{InstanceProfileCredentialsProvider}} whose getCredentials retries on whatever
error gets raised by the service. That'd have to be a very different retry
policy from {{S3ARetryPolicy}}, which tries to reconnect on network/connection
refused. We will want failfast there.
All I really need to know is the error raised & error text, and we can recover
from failures here with retry & backoff.
Looks like com.amazonaws.retry.RetryUtils has a predicate to see if an
exception is for throttling. If we use that in translateException, it'll work
> S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
> -----------------------------------------------------------------------
>
> Key: HADOOP-14237
> URL: https://issues.apache.org/jira/browse/HADOOP-14237
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
> Reporter: Kazuyuki Tanimura
> Assignee: Kazuyuki Tanimura
> Priority: Major
>
> When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails
> getting the instance profile credentials, eventually all jobs on the cluster
> fail. Since a number of S3A clients (all mappers and reducers) try to get the
> credentials, the AWS credential endpoint starts responding 5xx and 4xx error
> codes.
> SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it,
> but it still does not share the credentials with other EC2 nodes / JVM
> processes.
> This issue prevents users from creating Hadoop clusters on EC2
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]