[ 
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frode Halvorsen updated HDFS-7787:
----------------------------------
    Description: 
Each file has a setting of 3 replicas. split on different racks.
After a simulated crash of one rack (shutdown of all nodes, deleted 
data-directory an started nodes) and decommssion of one of the nodes in the 
orther rack the replication does not follow 'normal' rules...

My cluster has appx 25 mill files, and the one node I now try to decommision 
has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
replicas'. After a restart of the node, it starts to replicate both types of 
blocks, but after a while, it only repliates under-replicated blocks with other 
live copies. I would think that the 'normal' way to do this would be to make 
sure that all blocks this node keeps the only copy of, should be the first to 
be replicated/balanced ?  


  was:
Each file has a setting of 3 replicas. split on different racks.
After a simulated crash of one rack (shutdown of all nodes, deleted 
data-directory an started nodes) and decommssion of one of the nodes in the 
orther rack the replication does not follow 'normal' rules...

My cluster has appx 25 mill files, and the one node I now try to decommision 
has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
replicas'. After a restart of the node, it starts to replicate both types of 
blocks, but after a while, it only repliates under-replicated blocks with other 
live copies. I would think that the 'normal' way to do this would be to make 
sure that all blocks this node keeps the only copy of, should be the first to 
be replicated/balanced ?  Another thing, is that this takes 'forever'. The rate 
it's going now it will run for a couple of months before I can take down the 
node for maintance.. It only has appx 250 G of data in total .. 



> Wrong priorty of replication
> ----------------------------
>
>                 Key: HDFS-7787
>                 URL: https://issues.apache.org/jira/browse/HDFS-7787
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.6.0
>         Environment: 2 namenodes HA, 6 datanodes in two racks
>            Reporter: Frode Halvorsen
>              Labels: balance, hdfs, replication-performance
>
> Each file has a setting of 3 replicas. split on different racks.
> After a simulated crash of one rack (shutdown of all nodes, deleted 
> data-directory an started nodes) and decommssion of one of the nodes in the 
> orther rack the replication does not follow 'normal' rules...
> My cluster has appx 25 mill files, and the one node I now try to decommision 
> has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live 
> replicas'. After a restart of the node, it starts to replicate both types of 
> blocks, but after a while, it only repliates under-replicated blocks with 
> other live copies. I would think that the 'normal' way to do this would be to 
> make sure that all blocks this node keeps the only copy of, should be the 
> first to be replicated/balanced ?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to