[ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965266#comment-15965266 ]
Konstantin Shvachko commented on HDFS-11384: -------------------------------------------- * I am usually very conservative about introducing new configuration parameters. Parameters seem to give you flexibility to adjust them, but in many cases administrators don't know what to do with that flexibility, because there so many of them. I prefer to have a reasonable constant value initially, and add a config variable later if _other_ value are needed in certain cases. In the end adding configs is easy, but you can never remove them. In this particular case the BALANCER_NUM_RPC_PER_SEC is chosen so that big clusters would distribute _initial_ RPC requests over 10 secs, and it does not effect small clusters at all. I think we are good with the constant set to 20 for now, but let me know if you see use cases for different values. * Fixed the typo in 004 patch. Thanks [~zhz]. * This would be a typical misuse of Preconditions, as we do in many cases in the code, and as it was discussed previously on many occasions. It is an assert, because we assume the condition should never happen. If it does, it's a bug, which should be caught during testing, with {{-ea}} option. And in the runtime we want to avoid checking any extra condition for performance reasons. > Add option for balancer to disperse getBlocks calls to avoid NameNode's > rpc.CallQueueLength spike > ------------------------------------------------------------------------------------------------- > > Key: HDFS-11384 > URL: https://issues.apache.org/jira/browse/HDFS-11384 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover > Affects Versions: 2.7.3 > Reporter: yunjiong zhao > Assignee: yunjiong zhao > Attachments: balancer.day.png, balancer.week.png, > HDFS-11384.001.patch, HDFS-11384.002.patch, HDFS-11384.003.patch, > HDFS-11384.004.patch > > > When running balancer on hadoop cluster which have more than 3000 Datanodes > will cause NameNode's rpc.CallQueueLength spike. We observed this situation > could cause Hbase cluster failure due to RegionServer's WAL timeout. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org