[ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974123#comment-15974123 ]
Zhe Zhang commented on HDFS-11384: ---------------------------------- Thanks [~shv], the main logic LGTM. I could not reproduce reported unit test failures either. +1 pending a few final comments: # IIUC, {{BALANCER_NUM_RPC_PER_SEC}} is a best-effort throttling target, instead of a guaranteed threshold. E.g. it looks possible for {{Thread.sleep(delay)}} to be interrupted and {{getBlockList}} to be retried in the while loop. Or the entire {{dispatchBlocks}} call in a thread could die before {{delay}} seconds, then another {{future\[j\]}} will be issued without the delay. (Assuming this understanding is correct), I think this is the right way to handle this logic -- it is a good idea not to optimize for these rare cases. But can we update the documentation for {{BALANCER_NUM_RPC_PER_SEC}} to reflect it? # {{private void dispatchBlocks(long delay) {}} doesn't explain {{delay}} in its Javadoc. # What does {{testBalancerRPCDelay}} verify? It is not checking the number of RPC calls. > Add option for balancer to disperse getBlocks calls to avoid NameNode's > rpc.CallQueueLength spike > ------------------------------------------------------------------------------------------------- > > Key: HDFS-11384 > URL: https://issues.apache.org/jira/browse/HDFS-11384 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover > Affects Versions: 2.7.3 > Reporter: yunjiong zhao > Assignee: yunjiong zhao > Attachments: balancer.day.png, balancer.week.png, > HDFS-11384.001.patch, HDFS-11384.002.patch, HDFS-11384.003.patch, > HDFS-11384.004.patch, HDFS-11384.005.patch > > > When running balancer on hadoop cluster which have more than 3000 Datanodes > will cause NameNode's rpc.CallQueueLength spike. We observed this situation > could cause Hbase cluster failure due to RegionServer's WAL timeout. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org