[
https://issues.apache.org/jira/browse/HADOOP-17092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153972#comment-17153972
]
Bilahari T H commented on HADOOP-17092:
---------------------------------------
*New configs added as part of the JIRA*
The exponential retry policy used for the AAD token fetch retries can be tuned
with the following configurations.
* fs.azure.oauth.token.fetch.retry.max.retries: Sets the maximum number of
retries. Default value is 5.
* fs.azure.oauth.token.fetch.retry.min.backoff.interval`: Minimum back-off
interval. Added to the retry interval computed from delta backoff. By default
this si set as 0. Set the interval in milli seconds.
* fs.azure.oauth.token.fetch.retry.max.backoff.interval`: Maximum back-off
interval. Default value is 60000 (sixty seconds). Set the interval in milli
seconds.
* fs.azure.oauth.token.fetch.retry.delta.backoff`: Back-off interval between
retries. Multiples of this timespan are used for subsequent retry attempts. The
default value is 2.
> ABFS: Long waits and unintended retries when multiple threads try to fetch
> token using ClientCreds
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-17092
> URL: https://issues.apache.org/jira/browse/HADOOP-17092
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.3.0
> Reporter: Sneha Vijayarajan
> Assignee: Bilahari T H
> Priority: Major
> Fix For: 3.4.0
>
>
> Issue reported by DB:
> we recently experienced some problems with ABFS driver that highlighted a
> possible issue with long hangs following synchronized retries when using the
> _ClientCredsTokenProvider_ and calling _AbfsClient.getAccessToken_. We have
> seen
> [https://github.com/apache/hadoop/pull/1923|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhadoop%2Fpull%2F1923&data=02%7c01%7csnvijaya%40microsoft.com%7c7362c5ba4af24a553c4308d807ec459d%7c72f988bf86f141af91ab2d7cd011db47%7c1%7c0%7c637268058650442694&sdata=FePBBkEqj5kI2Ty4kNr3a2oJgB8Kvy3NvyRK8NoxyH4%3D&reserved=0],
> but it does not directly apply since we are not using a custom token
> provider, but instead _ClientCredsTokenProvider_ that ultimately relies on
> _AzureADAuthenticator_.
>
> The problem was that the critical section of getAccessToken, combined with a
> possibly redundant retry policy, made jobs hanging for a very long time,
> since only one thread at a time could make progress, and this progress
> amounted to basically retrying on a failing connection for 30-60 minutes.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]