[ 
https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558788#comment-16558788
 ] 

Steve Loughran commented on HADOOP-14237:
-----------------------------------------

Revisiting this

* what was the use case which triggered this. You were trying to log in and 
things were failing?
* And this was in EC2, so you were using the IAM role aut, which does an HTTP 
Get?
* Can we have a stack trace?

We could address this by having our own subclass of the 
{{InstanceProfileCredentialsProvider}} whose getCredentials retries on whatever 
error gets raised by the service. That'd have to be a very different retry 
policy from {{S3ARetryPolicy}}, which tries to reconnect on network/connection 
refused. We will want failfast there. 

All I really need to know is the error raised & error text, and we can recover 
from failures here with retry & backoff.

Looks like com.amazonaws.retry.RetryUtils has a predicate to see if an 
exception is for throttling. If we use that in translateException, it'll work



> S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-14237
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14237
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
>         Environment: EC2, AWS
>            Reporter: Kazuyuki Tanimura
>            Assignee: Kazuyuki Tanimura
>            Priority: Major
>
> When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
> getting the instance profile credentials, eventually all jobs on the cluster 
> fail. Since a number of S3A clients (all mappers and reducers) try to get the 
> credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
> codes.
> SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
> but it still does not share the credentials with other EC2 nodes / JVM 
> processes.
> This issue prevents users from creating Hadoop clusters on EC2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to