[
https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273669#comment-13273669
]
Konstantin Shvachko commented on HDFS-3368:
-------------------------------------------
> d01, do2, do3 are chosen for adding new block.
They are not chosen for new blocks. This is a different scenario.
do[1-3] went down long time ago (and all blocks were replicated out to other
nodes), but were not put into exclude list.
*On cluster restart* do[1-3] are brought up along with dn[1-3]. So for a brief
period of time the block had 6 replicas. 3 of them need to be deleted. Because
of the current default policy in place the replicas will be chosen to be
deleted from dn[1-3], because those have less free space. do[1-3] are flaky and
die shortly after sending block reports on restart. So 10 minutes later all 6
replicas will be gone.
Just as I described in my first comment. The bug is in the default policy. I'm
not defining a new one.
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>
> Key: HDFS-3368
> URL: https://issues.apache.org/jira/browse/HDFS-3368
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Attachments: blockDeletePolicy-0.22.patch,
> blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
>
>
> All replicas of a block can be removed if bad DataNodes come up and down
> during cluster restart resulting in data loss.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira