[ 
https://issues.apache.org/jira/browse/HADOOP-19181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848336#comment-17848336
 ] 

Steve Loughran commented on HADOOP-19181:
-----------------------------------------


Spent some time looking into the AWS SDK with Harshit Gupta and Mukund Thakur

h2. AWS API docs

AWS docs says callers should retry with backoff on throttling. But; it doesn't 
say what error code. Assume 503 for consistency with other services (s3): 
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html#instancedata-throttling

h2. v1 SDK Credential collection

Looking at v1 sdk com.amazonaws.auth.BaseCredentialsFetcher

* will probe for credentials whenever its been 10 minutes since last check.
* or when clock has passed expiry time
* refresh before expiry time is 15 minutes before expory
* credential retrieval will long and continue if existing credentials exist, 
even
  if they have expired (no retry)


h2. V2 SDK
* There is no attempt to retry on a GET of credentials from EC2 instances 
(InstanceProfileCredentialsProvider)
* There is a retry policy for container credentials; the GET is retried 5 times 
with no delay on any 5xx error.

When does prefetch take place?

{code}
private Instant prefetchTime(Instant expiration) {
    Instant now = clock.instant();

    if (expiration == null) {
        return now.plus(60, MINUTES);
    }

    Duration timeUntilExpiration = Duration.between(now, expiration);
    if (timeUntilExpiration.isNegative()) {
        // IMDS gave us a time in the past. We're already stale. Don't prefetch.
        return null;
    }

    return now.plus(maximum(timeUntilExpiration.dividedBy(2), 
Duration.ofMinutes(5)));
}
{code}

If you get credentials and the expiry time is under 5 minutes, prefetching will 
not take place.
No worker processes launched a few minutes before session credential expiry 
will have any refresh until the credentials are consistered stale.

When are credentials considered stale?

{code}
return expiration.minusSeconds(1);
{code}

so there's only 1s for a blocking fetch. If there is any clock drift *or jvm 
pause*

And if that request fails

{code}
Instant newStaleTime = jitterTime(now, Duration.ofMillis(1), 
maxStaleFailureJitter(numFailures));
log.warn(() -> "(" + cachedValueName + ") Cached value expiration has been 
extended to " +
               newStaleTime + " because calling the downstream service failed 
(consecutive failures: " +
               numFailures + ").", e);

return currentCachedValue.toBuilder()
                         .staleTime(newStaleTime)
                         .build();

{code}

There is no jitter enabled in the prefetch, only in that retrieval of stale 
credentials.

And that can be up to 10s, even though the credentials expire in 1s. 

{code}
private Duration maxStaleFailureJitter(int numFailures) {
    long exponentialBackoffMillis = (1L << numFailures - 1) * 100;
    return ComparableUtils.minimum(Duration.ofMillis(exponentialBackoffMillis), 
Duration.ofSeconds(10));
}
{code}

A single failure of the GET for any reason is going to return credentials that 
are inevitably out of date.

h3. ContainerCredentialsProvider

This class does choose a different retry policy, retaining that 15 minute 
policy.
{code}
private Instant prefetchTime(Instant expiration) {
    Instant oneHourFromNow = Instant.now().plus(1, ChronoUnit.HOURS);

    if (expiration == null) {
        return oneHourFromNow;
    }

    Instant fifteenMinutesBeforeExpiration = expiration.minus(15, 
ChronoUnit.MINUTES);

    return ComparableUtils.minimum(oneHourFromNow, 
fifteenMinutesBeforeExpiration);
}
{code}

It also has a retry poicy on failure

{code}
private static final int MAX_RETRIES = 5;

@Override
public boolean shouldRetry(int retriesAttempted, 
ResourcesEndpointRetryParameters retryParams) {
    if (retriesAttempted >= MAX_RETRIES) {
        return false;
    }

    Integer statusCode = retryParams.getStatusCode();
    if (statusCode != null && HttpStatusFamily.of(statusCode) == 
HttpStatusFamily.SERVER_ERROR) {
        return true;
    }

    return retryParams.getException() instanceof IOException;
}
{code}

The retry policy means there is a brief attempt at recovery, without the cache 
jitter
logic getting involved.

This probably makes it more resilient to failures, though if there are load 
problems,
the sequence of 5 GET requests will not help.

Hypothesised failure conditions.

* If many processes are launched so close together that they are prefetching at 
about the same time. And as the credentials on the same server expires at 
exactly the same time for all processes, if the prefetch hasn't taken place 
then it will happen when credentials are considered stale.
* Or multiple s3a clients to different filesystems on same process.
* This happens with < 1s to go, so brittle to clock, process swap, jvm gc etc.

Changes to suggest for SDK

* InstanceProfileCredentialsProvider.prefetchTime to be reviewed.
* Enable jitter on cache refresh. maybe, because the jitter interval is up to 
10s. But there's
  jitter on the choice of stale expiry.
* declare stale more than 1s before expiry. In particular, it should be > 10s 
for
  the jitter code to get involved.
* share ContainerCredentialsProvider retry policy across both classes, with 
retry on 503.

(note, all code is Copyright Amazon.com, Inc. or its affiliates); not for 
incorporatation into ASF codebase.

extra logs to log at debug for anyone trying to debug the sdk

{code}
software.amazon.awssdk.utils.cache.CachedSupplier
software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider
{code}

> IAMCredentialsProvider throttle failures
> ----------------------------------------
>
>                 Key: HADOOP-19181
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19181
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> Tests report throttling errors in IAM being remapped to noauth and failure
> Again, impala tests, but with multiple processes on same host. this means 
> that HADOOP-18945 isn't sufficient as even if it ensures a singleton instance 
> for a process
> * it doesn't if there are many test buckets (fixable)
> * it doesn't work across processes (not fixable)
> we may be able to 
> * use a singleton across all filesystem instances
> * once we know how throttling is reported, handle it through retries + 
> error/stats collection



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to