[ 
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321709#comment-14321709
 ] 

Frode Halvorsen commented on HDFS-7787:
---------------------------------------

Hello.

This was some time ago, and it might be that I didn't have any decommissioning 
nodes when I was observing that the namenode didn't prioritize the blocks with 
only one replica first. When i look in the logs now, the namenode asks the 
decommissioning node to replicate every block to three other nodes, thus I 
believe it only get replication-requests for blocks with no live replicas.

This leaves me with the struggle to speed up the process :)
 

> Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority 
> to blocks on nodes being decomissioned
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7787
>                 URL: https://issues.apache.org/jira/browse/HDFS-7787
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.6.0
>         Environment: 2 namenodes HA, 6 datanodes in two racks
>            Reporter: Frode Halvorsen
>              Labels: balance, hdfs, replication-performance
>
> Each file has a setting of 3 replicas. split on different racks.
> After a simulated crash of one rack (shutdown of all nodes, deleted 
> data-directory an started nodes) and decommssion of one of the nodes in the 
> orther rack the replication does not follow 'normal' rules...
> My cluster has appx 25 mill files, and the one node I now try to decommision 
> has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
> replicas'. After a restart of the node, it starts to replicate both types of 
> blocks, but after a while, it only repliates under-replicated blocks with 
> other live copies. I would think that the 'normal' way to do this would be to 
> make sure that all blocks this node keeps the only copy of, should be the 
> first to be replicated/balanced ?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to