[ 
https://issues.apache.org/jira/browse/HADOOP-17092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153972#comment-17153972
 ] 

Bilahari T H commented on HADOOP-17092:
---------------------------------------

*New configs added as part of the JIRA*
The exponential retry policy used for the AAD token fetch retries can be tuned 
with the following configurations.
* fs.azure.oauth.token.fetch.retry.max.retries: Sets the maximum number of 
retries. Default value is 5.
* fs.azure.oauth.token.fetch.retry.min.backoff.interval`: Minimum back-off 
interval. Added to the retry interval computed from delta backoff. By default 
this si set as 0. Set the interval in milli seconds.
* fs.azure.oauth.token.fetch.retry.max.backoff.interval`: Maximum back-off 
interval. Default value is 60000 (sixty seconds). Set the interval in milli 
seconds.
* fs.azure.oauth.token.fetch.retry.delta.backoff`: Back-off interval between 
retries. Multiples of this timespan are used for subsequent retry attempts. The 
default value is 2.

> ABFS: Long waits and unintended retries when multiple threads try to fetch 
> token using ClientCreds
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-17092
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17092
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.3.0
>            Reporter: Sneha Vijayarajan
>            Assignee: Bilahari T H
>            Priority: Major
>             Fix For: 3.4.0
>
>
> Issue reported by DB:
> we recently experienced some problems with ABFS driver that highlighted a 
> possible issue with long hangs following synchronized retries when using the 
> _ClientCredsTokenProvider_ and calling _AbfsClient.getAccessToken_. We have 
> seen 
> [https://github.com/apache/hadoop/pull/1923|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhadoop%2Fpull%2F1923&data=02%7c01%7csnvijaya%40microsoft.com%7c7362c5ba4af24a553c4308d807ec459d%7c72f988bf86f141af91ab2d7cd011db47%7c1%7c0%7c637268058650442694&sdata=FePBBkEqj5kI2Ty4kNr3a2oJgB8Kvy3NvyRK8NoxyH4%3D&reserved=0],
>  but it does not directly apply since we are not using a custom token 
> provider, but instead _ClientCredsTokenProvider_ that ultimately relies on 
> _AzureADAuthenticator_. 
>  
> The problem was that the critical section of getAccessToken, combined with a 
> possibly redundant retry policy, made jobs hanging for a very long time, 
> since only one thread at a time could make progress, and this progress 
> amounted to basically retrying on a failing connection for 30-60 minutes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to