[
https://issues.apache.org/jira/browse/HADOOP-19181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848336#comment-17848336
]
Steve Loughran commented on HADOOP-19181:
-----------------------------------------
Spent some time looking into the AWS SDK with Harshit Gupta and Mukund Thakur
h2. AWS API docs
AWS docs says callers should retry with backoff on throttling. But; it doesn't
say what error code. Assume 503 for consistency with other services (s3):
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html#instancedata-throttling
h2. v1 SDK Credential collection
Looking at v1 sdk com.amazonaws.auth.BaseCredentialsFetcher
* will probe for credentials whenever its been 10 minutes since last check.
* or when clock has passed expiry time
* refresh before expiry time is 15 minutes before expory
* credential retrieval will long and continue if existing credentials exist,
even
if they have expired (no retry)
h2. V2 SDK
* There is no attempt to retry on a GET of credentials from EC2 instances
(InstanceProfileCredentialsProvider)
* There is a retry policy for container credentials; the GET is retried 5 times
with no delay on any 5xx error.
When does prefetch take place?
{code}
private Instant prefetchTime(Instant expiration) {
Instant now = clock.instant();
if (expiration == null) {
return now.plus(60, MINUTES);
}
Duration timeUntilExpiration = Duration.between(now, expiration);
if (timeUntilExpiration.isNegative()) {
// IMDS gave us a time in the past. We're already stale. Don't prefetch.
return null;
}
return now.plus(maximum(timeUntilExpiration.dividedBy(2),
Duration.ofMinutes(5)));
}
{code}
If you get credentials and the expiry time is under 5 minutes, prefetching will
not take place.
No worker processes launched a few minutes before session credential expiry
will have any refresh until the credentials are consistered stale.
When are credentials considered stale?
{code}
return expiration.minusSeconds(1);
{code}
so there's only 1s for a blocking fetch. If there is any clock drift *or jvm
pause*
And if that request fails
{code}
Instant newStaleTime = jitterTime(now, Duration.ofMillis(1),
maxStaleFailureJitter(numFailures));
log.warn(() -> "(" + cachedValueName + ") Cached value expiration has been
extended to " +
newStaleTime + " because calling the downstream service failed
(consecutive failures: " +
numFailures + ").", e);
return currentCachedValue.toBuilder()
.staleTime(newStaleTime)
.build();
{code}
There is no jitter enabled in the prefetch, only in that retrieval of stale
credentials.
And that can be up to 10s, even though the credentials expire in 1s.
{code}
private Duration maxStaleFailureJitter(int numFailures) {
long exponentialBackoffMillis = (1L << numFailures - 1) * 100;
return ComparableUtils.minimum(Duration.ofMillis(exponentialBackoffMillis),
Duration.ofSeconds(10));
}
{code}
A single failure of the GET for any reason is going to return credentials that
are inevitably out of date.
h3. ContainerCredentialsProvider
This class does choose a different retry policy, retaining that 15 minute
policy.
{code}
private Instant prefetchTime(Instant expiration) {
Instant oneHourFromNow = Instant.now().plus(1, ChronoUnit.HOURS);
if (expiration == null) {
return oneHourFromNow;
}
Instant fifteenMinutesBeforeExpiration = expiration.minus(15,
ChronoUnit.MINUTES);
return ComparableUtils.minimum(oneHourFromNow,
fifteenMinutesBeforeExpiration);
}
{code}
It also has a retry poicy on failure
{code}
private static final int MAX_RETRIES = 5;
@Override
public boolean shouldRetry(int retriesAttempted,
ResourcesEndpointRetryParameters retryParams) {
if (retriesAttempted >= MAX_RETRIES) {
return false;
}
Integer statusCode = retryParams.getStatusCode();
if (statusCode != null && HttpStatusFamily.of(statusCode) ==
HttpStatusFamily.SERVER_ERROR) {
return true;
}
return retryParams.getException() instanceof IOException;
}
{code}
The retry policy means there is a brief attempt at recovery, without the cache
jitter
logic getting involved.
This probably makes it more resilient to failures, though if there are load
problems,
the sequence of 5 GET requests will not help.
Hypothesised failure conditions.
* If many processes are launched so close together that they are prefetching at
about the same time. And as the credentials on the same server expires at
exactly the same time for all processes, if the prefetch hasn't taken place
then it will happen when credentials are considered stale.
* Or multiple s3a clients to different filesystems on same process.
* This happens with < 1s to go, so brittle to clock, process swap, jvm gc etc.
Changes to suggest for SDK
* InstanceProfileCredentialsProvider.prefetchTime to be reviewed.
* Enable jitter on cache refresh. maybe, because the jitter interval is up to
10s. But there's
jitter on the choice of stale expiry.
* declare stale more than 1s before expiry. In particular, it should be > 10s
for
the jitter code to get involved.
* share ContainerCredentialsProvider retry policy across both classes, with
retry on 503.
(note, all code is Copyright Amazon.com, Inc. or its affiliates); not for
incorporatation into ASF codebase.
extra logs to log at debug for anyone trying to debug the sdk
{code}
software.amazon.awssdk.utils.cache.CachedSupplier
software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider
{code}
> IAMCredentialsProvider throttle failures
> ----------------------------------------
>
> Key: HADOOP-19181
> URL: https://issues.apache.org/jira/browse/HADOOP-19181
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.0
> Reporter: Steve Loughran
> Priority: Major
>
> Tests report throttling errors in IAM being remapped to noauth and failure
> Again, impala tests, but with multiple processes on same host. this means
> that HADOOP-18945 isn't sufficient as even if it ensures a singleton instance
> for a process
> * it doesn't if there are many test buckets (fixable)
> * it doesn't work across processes (not fixable)
> we may be able to
> * use a singleton across all filesystem instances
> * once we know how throttling is reported, handle it through retries +
> error/stats collection
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]