[ 
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321968#comment-14321968
 ] 

Frode Halvorsen commented on HDFS-7787:
---------------------------------------

Sorry- my grep was wrong, and included a lot of replications for earlier times, 
but it was still on the same decom-node.

The correct stats for the 10 minutes between 13:00 and 13.10 today is :
a total of 3161 started threads. None of thos was for blocks with two live 
replicas, but 2430 was for blocks with one live replica and only 731 was blocks 
without live replicas.
That means that only 1/4 of the blocks replicated was of the 'highest 
priority'. And of course this made my day worse ; I now hae to wait one month 
befor I can take down the node... 


> Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority 
> to blocks on nodes being decomissioned
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7787
>                 URL: https://issues.apache.org/jira/browse/HDFS-7787
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.6.0
>         Environment: 2 namenodes HA, 6 datanodes in two racks
>            Reporter: Frode Halvorsen
>              Labels: balance, hdfs, replication-performance
>
> Each file has a setting of 3 replicas. split on different racks.
> After a simulated crash of one rack (shutdown of all nodes, deleted 
> data-directory an started nodes) and decommssion of one of the nodes in the 
> orther rack the replication does not follow 'normal' rules...
> My cluster has appx 25 mill files, and the one node I now try to decommision 
> has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
> replicas'. After a restart of the node, it starts to replicate both types of 
> blocks, but after a while, it only repliates under-replicated blocks with 
> other live copies. I would think that the 'normal' way to do this would be to 
> make sure that all blocks this node keeps the only copy of, should be the 
> first to be replicated/balanced ?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to