[ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933354#comment-15933354
 ] 

yunjiong zhao edited comment on HDFS-11384 at 3/20/17 7:26 PM:
---------------------------------------------------------------

Thanks [~shv] for review.
Only when you set dfs.balancer.getBlocks.interval.millis to non-zero, Balancer 
will only allow one thread to issue getBlocks()at any given time. Otherwise 
this patch doesn't change anything.
So only one change actually.

If use wait, it will release the lock, so can't make sure there are only one 
thread will call getBlocks().

By default, this patch doesn't change anything. So if you need run Balancer 
aggressively, don't set   dfs.balancer.getBlocks.interval.millis.



{quote}
Can we add some heuristics so that the Balancer could adjust by itself instead 
of adding the configuration parameter
{quote}
I though this before. The best way I can thought is add new function in IPC 
that let clients get the CallQueueLength, if CallQueueLength is too high, block 
getBlocks() until the CallQueueLength become normal again.




was (Author: zhaoyunjiong):
Thanks [~shv] for review.
Only when you set dfs.balancer.getBlocks.interval.millis to non-zero, Balancer 
will only allow one thread to issue {code}getBlocks(){code} at any given time. 
Otherwise this patch doesn't change anything.
So only one change actually.

If use wait, it will release the lock, so can't make sure there are only one 
thread will call {code}getBlocks(){code}.

By default, this patch doesn't change anything. So if you need run Balancer 
aggressively, don't set   dfs.balancer.getBlocks.interval.millis.



{quote}
Can we add some heuristics so that the Balancer could adjust by itself instead 
of adding the configuration parameter
{quote}
I though this before. The best way I can thought is add new function in IPC 
that let clients get the CallQueueLength, if CallQueueLength is too high, block 
getBlocks() until the CallQueueLength become normal again.



> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11384
>                 URL: https://issues.apache.org/jira/browse/HDFS-11384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: yunjiong zhao
>         Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to