[ 
https://issues.apache.org/jira/browse/HDFS-16671?focusedWorklogId=795932&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795932
 ]

ASF GitHub Bot logged work on HDFS-16671:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Jul/22 05:27
            Start Date: 28/Jul/22 05:27
    Worklog Time Spent: 10m 
      Work Description: slfan1989 commented on code in PR #4597:
URL: https://github.com/apache/hadoop/pull/4597#discussion_r931794170


##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/resources/hdfs-rbf-default.xml:
##########
@@ -723,6 +723,14 @@
     </description>
   </property>
 
+  <property>
+    <name>dfs.federation.router.fairness.acquire.timeout</name>
+    <value>1s</value>

Review Comment:
   I see, your configuration is accurate.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 795932)
    Time Spent: 2h 40m  (was: 2.5h)

> RBF: RouterRpcFairnessPolicyController supports configurable permit acquire 
> timeout
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-16671
>                 URL: https://issues.apache.org/jira/browse/HDFS-16671
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> RouterRpcFairnessPolicyController supports configurable permit acquire 
> timeout. Hardcode 1s is very long, and it has causes an incident in our prod 
> environment when one nameserivce is busy.
> And the optimal timeout maybe should be less than p50(avgTime).
> And all handlers in RBF is waiting to acquire the permit of the busy ns. 
> {code:java}
> "IPC Server handler 12 on default port 8888" #2370 daemon prio=5 os_prio=0 
> tid=? nid=?  waiting on condition [?]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <?> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>       at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
>       at 
> org.apache.hadoop.hdfs.server.federation.fairness.AbstractRouterRpcFairnessPolicyController.acquirePermit(AbstractRouterRpcFairnessPolicyController.java:56)
>       at 
> org.apache.hadoop.hdfs.server.federation.fairness.DynamicRouterRpcFairnessPolicyController.acquirePermit(DynamicRouterRpcFairnessPolicyController.java:123)
>       at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.acquirePermit(RouterRpcClient.java:1500)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to