[ 
https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273669#comment-13273669
 ] 

Konstantin Shvachko commented on HDFS-3368:
-------------------------------------------

> d01, do2, do3 are chosen for adding new block.

They are not chosen for new blocks. This is a different scenario.
do[1-3] went down long time ago (and all blocks were replicated out to other 
nodes), but were not put into exclude list.
*On cluster restart* do[1-3] are brought up along with dn[1-3]. So for a brief 
period of time the block had 6 replicas. 3 of them need to be deleted. Because 
of the current default policy in place the replicas will be chosen to be 
deleted from dn[1-3], because those have less free space. do[1-3] are flaky and 
die shortly after sending block reports on restart. So 10 minutes later all 6 
replicas will be gone.
Just as I described in my first comment. The bug is in the default policy. I'm 
not defining a new one.
                
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>
>                 Key: HDFS-3368
>                 URL: https://issues.apache.org/jira/browse/HDFS-3368
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: blockDeletePolicy-0.22.patch, 
> blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
>
>
> All replicas of a block can be removed if bad DataNodes come up and down 
> during cluster restart resulting in data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to