[jira] [Comment Edited] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

Vinitha Reddy Gankidi (JIRA) Thu, 30 Mar 2017 15:30:06 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949978#comment-15949978
 ]


Vinitha Reddy Gankidi edited comment on HDFS-11384 at 3/30/17 10:29 PM:
------------------------------------------------------------------------

[~shv] I'm leaning towards (4) instead of (3).
{{isGoodBlockCandidate}} needs a global view of the block replicas. Also there 
is some additional logic to deal with erasure coded(EC) blocks and this may be 
a blocker for reading from DNs. [~zhz] you probably have more context regarding 
the EC blocks.
{code}
 /**
   * Decide if the block/blockGroup is a good candidate to be moved from source
   * to target. A block is a good candidate if
   * 1. the block is not in the process of being moved/has not been moved;
   * 2. the block does not have a replica/internalBlock on the target;
   * 3. doing the move does not reduce the number of racks that the block has
   */
  private boolean isGoodBlockCandidate(StorageGroup source, StorageGroup target,
      StorageType targetStorageType, DBlock block) {
{code}

I agree that (2) and (4) are complimentary. 


was (Author: redvine):
[~shv] I'm leaning towards reading from (4) instead of (3).
{{isGoodBlockCandidate}} needs a global view of the block replicas. Also there 
is some additional logic to deal with erasure coded(EC) blocks and this may be 
a blocker for reading from DNs. [~zhz] you probably have more context regarding 
the EC blocks.
{code}
 /**
   * Decide if the block/blockGroup is a good candidate to be moved from source
   * to target. A block is a good candidate if
   * 1. the block is not in the process of being moved/has not been moved;
   * 2. the block does not have a replica/internalBlock on the target;
   * 3. doing the move does not reduce the number of racks that the block has
   */
  private boolean isGoodBlockCandidate(StorageGroup source, StorageGroup target,
      StorageType targetStorageType, DBlock block) {
{code}

I agree that (2) and (4) are complimentary. 

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11384
>                 URL: https://issues.apache.org/jira/browse/HDFS-11384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: yunjiong zhao
>         Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

Reply via email to