[ https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278968#comment-13278968 ]
Konstantin Shvachko commented on HDFS-3368: ------------------------------------------- You are right a failure of three random nodes leads to a data loss. We know that and cannot do anything about it. The case is different here. The cluster has 6 replicas and ends up with 0 _as the result of the current policy_, which makes it almost *inevitable*. This can be avoided by a slight modification in the policy making it smarter about potentially flaky nodes. Your question about complexity is similar to one of why do we bother introducing all the HA complexity if all NameNodes primary and standby can fail at once. > Missing blocks due to bad DataNodes comming up and down. > -------------------------------------------------------- > > Key: HDFS-3368 > URL: https://issues.apache.org/jira/browse/HDFS-3368 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0 > Reporter: Konstantin Shvachko > Assignee: Konstantin Shvachko > Attachments: blockDeletePolicy-0.22.patch, > blockDeletePolicy-trunk.patch, blockDeletePolicy.patch > > > All replicas of a block can be removed if bad DataNodes come up and down > during cluster restart resulting in data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira