[jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

Konstantin Shvachko (JIRA) Tue, 11 Apr 2017 19:10:54 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965266#comment-15965266
 ]


Konstantin Shvachko commented on HDFS-11384:
--------------------------------------------

* I am usually very conservative about introducing new configuration 
parameters. Parameters seem to give you flexibility to adjust them, but in many 
cases administrators don't know what to do with that flexibility, because there 
so many of them. I prefer to have a reasonable constant value initially, and 
add a config variable later if _other_ value are needed in certain cases. In 
the end adding configs is easy, but you can never remove them.
In this particular case the BALANCER_NUM_RPC_PER_SEC is chosen so that big 
clusters would distribute _initial_ RPC requests over 10 secs, and it does not 
effect small clusters at all. I think we are good with the constant set to 20 
for now, but let me know if you see use cases for different values.
* Fixed the typo in 004 patch. Thanks [~zhz].
* This would be a typical misuse of Preconditions, as we do in many cases in 
the code, and as it was discussed previously on many occasions. It is an 
assert, because we assume the condition should never happen. If it does, it's a 
bug, which should be caught during testing, with {{-ea}} option. And in the 
runtime we want to avoid checking any extra condition for performance reasons.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11384
>                 URL: https://issues.apache.org/jira/browse/HDFS-11384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: yunjiong zhao
>         Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch, HDFS-11384.003.patch, 
> HDFS-11384.004.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

Reply via email to