[
https://issues.apache.org/jira/browse/HDFS-16671?focusedWorklogId=794544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794544
]
ASF GitHub Bot logged work on HDFS-16671:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 23/Jul/22 18:04
Start Date: 23/Jul/22 18:04
Worklog Time Spent: 10m
Work Description: ayushtkn commented on code in PR #4597:
URL: https://github.com/apache/hadoop/pull/4597#discussion_r928150015
##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/fairness/TestRouterRpcFairnessPolicyController.java:
##########
@@ -83,6 +87,29 @@ public void testHandlerAllocationPreconfigured() {
assertFalse(routerRpcFairnessPolicyController.acquirePermit(CONCURRENT_NS));
}
+ @Test
+ public void testAcquireTimeout() {
+ Configuration conf = createConf(40);
+ conf.setInt(DFS_ROUTER_FAIR_HANDLER_COUNT_KEY_PREFIX + "ns1", 30);
+ conf.setTimeDuration(DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT, 100,
TimeUnit.MILLISECONDS);
+ RouterRpcFairnessPolicyController routerRpcFairnessPolicyController =
+ FederationUtil.newFairnessPolicyController(conf);
+
+ // ns1 should have 30 permits allocated
+ for (int i = 0; i < 30; i++) {
+ assertTrue(routerRpcFairnessPolicyController.acquirePermit("ns1"));
+ }
+ long acquireBeginTimeMs = Time.monotonicNow();
+ assertFalse(routerRpcFairnessPolicyController.acquirePermit("ns1"));
+ long acquireEndTimeMs = Time.monotonicNow();
+
+ long acquireTimeMs = acquireEndTimeMs - acquireBeginTimeMs;
+
+ // There are some other operations, so acquireTimeMs >= 100ms.
+ assertTrue(acquireTimeMs >= 100);
+ assertTrue(acquireTimeMs < 100 + 50);
Review Comment:
Either we can keep this safe margin way above or remove this. @goiri do you
have any suggestions?
Issue Time Tracking
-------------------
Worklog Id: (was: 794544)
Time Spent: 2h (was: 1h 50m)
> RBF: RouterRpcFairnessPolicyController supports configurable permit acquire
> timeout
> -----------------------------------------------------------------------------------
>
> Key: HDFS-16671
> URL: https://issues.apache.org/jira/browse/HDFS-16671
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h
> Remaining Estimate: 0h
>
> RouterRpcFairnessPolicyController supports configurable permit acquire
> timeout. Hardcode 1s is very long, and it has causes an incident in our prod
> environment when one nameserivce is busy.
> And the optimal timeout maybe should be less than p50(avgTime).
> And all handlers in RBF is waiting to acquire the permit of the busy ns.
> {code:java}
> "IPC Server handler 12 on default port 8888" #2370 daemon prio=5 os_prio=0
> tid=? nid=? waiting on condition [?]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <?> (a
> java.util.concurrent.Semaphore$NonfairSync)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
> at
> org.apache.hadoop.hdfs.server.federation.fairness.AbstractRouterRpcFairnessPolicyController.acquirePermit(AbstractRouterRpcFairnessPolicyController.java:56)
> at
> org.apache.hadoop.hdfs.server.federation.fairness.DynamicRouterRpcFairnessPolicyController.acquirePermit(DynamicRouterRpcFairnessPolicyController.java:123)
> at
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.acquirePermit(RouterRpcClient.java:1500)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]