[ 
https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278968#comment-13278968
 ] 

Konstantin Shvachko commented on HDFS-3368:
-------------------------------------------

You are right a failure of three random nodes leads to a data loss. We know 
that and cannot do anything about it.

The case is different here. The cluster has 6 replicas and ends up with 0 _as 
the result of the current policy_, which makes it almost *inevitable*. This can 
be avoided by a slight modification in the policy making it smarter about 
potentially flaky nodes.

Your question about complexity is similar to one of why do we bother 
introducing all the HA complexity if all NameNodes primary and standby can fail 
at once.
                
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>
>                 Key: HDFS-3368
>                 URL: https://issues.apache.org/jira/browse/HDFS-3368
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: blockDeletePolicy-0.22.patch, 
> blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
>
>
> All replicas of a block can be removed if bad DataNodes come up and down 
> during cluster restart resulting in data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to