[jira] [Commented] (HADOOP-17377) ABFS: MsiTokenProvider doesn't retry HTTP 429 from the Instance Metadata Service

ASF GitHub Bot (Jira) Tue, 22 Aug 2023 23:03:55 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757828#comment-17757828
 ]


ASF GitHub Bot commented on HADOOP-17377:
-----------------------------------------

hadoop-yetus commented on PR #5273:
URL: https://github.com/apache/hadoop/pull/5273#issuecomment-1689330087

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m  0s |  |  Docker mode activated.  |
   | -1 :x: |  patch  |   0m 17s |  |  
https://github.com/apache/hadoop/pull/5273 does not apply to trunk. Rebase 
required? Wrong Branch? See 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.  
|
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | GITHUB PR | https://github.com/apache/hadoop/pull/5273 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5273/12/console |
   | versions | git=2.17.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> ABFS: MsiTokenProvider doesn't retry HTTP 429 from the Instance Metadata 
> Service
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-17377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17377
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 3.2.1
>            Reporter: Brandon
>            Priority: Major
>              Labels: pull-request-available
>
> *Summary*
>  The instance metadata service has its own guidance for error handling and 
> retry which are different from the Blob store. 
> [https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token#error-handling]
> In particular, it responds with HTTP 429 if request rate is too high. Whereas 
> Blob store will respond with HTTP 503. The retry policy used only accounts 
> for the latter as it will retry any status >=500. This can result in job 
> instability when running multiple processes on the same host.
> *Environment*
>  * Spark talking to an ABFS store
>  * Hadoop 3.2.1
>  * Running on an Azure VM with user-assigned identity, ABFS configured to use 
> MsiTokenProvider
>  * 6 executor processes on each VM
> *Example*
>  Here's an example error message and stack trace. It's always the same stack 
> trace. This appears in logs a few hundred to low thousands of times a day. 
> It's luckily skating by since the download operation is wrapped in 3 retries.
> {noformat}
> AADToken: HTTP connection failed for getting token from AzureAD. Http 
> response: 429 null
> Content-Type: application/json; charset=utf-8 Content-Length: 90 Request ID:  
> Proxies: none
> First 1K of Body: {"error":"invalid_request","error_description":"Temporarily 
> throttled, too many requests"}
>       at 
> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:190)
>       at 
> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:125)
>       at 
> org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:506)
>       at 
> org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:489)
>       at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getIsNamespaceEnabled(AzureBlobFileSystemStore.java:208)
>       at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:473)
>       at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:437)
>       at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1717)
>       at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
>       at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:724)
>       at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496)
>       at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:812)
>       at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:803)
>       at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:792)
>       at 
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>       at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>       at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>       at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>       at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:791)
>       at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:803)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748){noformat}
>  CC [~mackrorysd], [[email protected]]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-17377) ABFS: MsiTokenProvider doesn't retry HTTP 429 from the Instance Metadata Service

Reply via email to