Serhii Nesterov created HADOOP-19620:
----------------------------------------

             Summary: AzureADAuthenticator should be able to retry on 
UnknownHostException
                 Key: HADOOP-19620
                 URL: https://issues.apache.org/jira/browse/HADOOP-19620
             Project: Hadoop Common
          Issue Type: Improvement
          Components: auth
    Affects Versions: 3.4.1
            Reporter: Serhii Nesterov


When Hadoop is requested to perform operations against ADLS Gen2 storage, 
`AbfsRestOperation` attempts to obtain an access token from Microsoft. 
Underneath the hood, it uses a simple `java.net.HttpURLConnection` HTTP client.

Occasionally, enviroments may run into network intermittent issues, including 
DNS-related `UnknownHostException`. Technically, the HTTP client throws 
`IOException` whose cause is `UnknownHostException`. AzureADAuthenticator in 
turn catches `IOException`, sets `httperror = -1` and then checks whether the 
error is recoverable and can be retried. It's neither an instance of 
`MalformedURLException`, nor an instance of `FileNotFoundException`, nor a 
recoverable status code (< 100 || == 408 || >= 500 && != 501 && != 505), hence 
a retry never occurs which is sensitive for our project causing problems with 
state recovery.

The final exception stack trace on the client side looks as follows (Apache 
Spark application):
{code:java}
Job aborted due to stage failure: Task 14 in stage 384.0 failed 4 times, most 
recent failure: Lost task 14.3 in stage 384.0 TID 3087 10.244.91.7 executor 29 
: Status code: -1 error code: null error message: Auth failure: HTTP Error -1; 
url='https://login.miicrosoftonline.com/$TENANT_ID/oauth2/v2.0/token' 
AzureADAuthenticator.getTokenCall threw java.net.UnknownHostException: 
login.microsoftonline.com
at org.apache.hadoop.fs.azurebfs.services. Abfs 
RestOperation.executeHttpOperation Abfs RestOperation.java:321
at org.apache.hadoop.fs.azurebfs.services. AbfsRestOperation.completeExecute 
AbfsRestOperation.java:263
at org.apache.hadoop.fs.azurebfs.services. AbfsRestOperation.lambda$exe_cute$0 
AbfsRestOperation.java:235
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.measureDurationOfInvocation
 IOStatisticsBinding.java:494
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation
 IOStatisticsBinding.java:465
at org.apache.hadoop.fs.azurebfs.services. AbfsRestOperation.exe_cute Abfs 
RestOperation.java:233
at org.apache.hadoop.fs.azurebfs.services. AbfsClient.getPathStatus 
AbfsClient.java:1099
at
org.apache.hadoop.fs.azurebfs. AzureBlobFileSystemStore.getFileStatus 
AzureBlobFileSystemStore.java:1164
at org.apache.hadoop.fs.azurebfs. Azure BlobFileSystem.getFileStatus 
AzureBlobFileSystem.java:766
at org.apache.hadoop.fs.azurebfs. AzureBlobFileSystem.getFileStatus 
AzureBlobFileSystem.java:756
at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath 
HadoopInputFile.java:39
at org.apache.spark.sql.execution.datasources. parquet. 
ParquetFooterReader.readFooter ParquetFooterReader.java:39
at org.apache.spark.sql.execution.datasources.parquet. 
ParquetFileFormat.footerFileMetaData$lzycompute$1 ParquetFileFormat.scala:211
at org.apache.spark.sql.execution.datasources.parquet. 
ParquetFileFormat.footerFileMetaData$1 ParquetFile Format.scala:210
at org.apache.spark.sql.execution.datasources.parquet. 
ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2
ParquetFileFormat.scala:213
...{code}
I can see this exception is recovered in other parts of the Hadoop project 
(e.g., `DefaultAMSProcessor`)

We would like to have similar retry mechanisms for fetching tokens. Moreover, 
`AbfsRestOperation` already handles and retries `UnknownHostException` but that 
part seems to be applicable only to storage communication, not token retrieval.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to