[
https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278968#comment-13278968
]
Konstantin Shvachko commented on HDFS-3368:
-------------------------------------------
You are right a failure of three random nodes leads to a data loss. We know
that and cannot do anything about it.
The case is different here. The cluster has 6 replicas and ends up with 0 _as
the result of the current policy_, which makes it almost *inevitable*. This can
be avoided by a slight modification in the policy making it smarter about
potentially flaky nodes.
Your question about complexity is similar to one of why do we bother
introducing all the HA complexity if all NameNodes primary and standby can fail
at once.
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>
> Key: HDFS-3368
> URL: https://issues.apache.org/jira/browse/HDFS-3368
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Attachments: blockDeletePolicy-0.22.patch,
> blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
>
>
> All replicas of a block can be removed if bad DataNodes come up and down
> during cluster restart resulting in data loss.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira