[ 
https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279247#comment-13279247
 ] 

Konstantin Shvachko commented on HDFS-3368:
-------------------------------------------

This is pretty rare but when you hit it it takes a while to figure out what 
went wrong. If not fixed the problem becomes a maintenance issue, that is ops 
will have to remember to add every failed node to the exclude list, which 
sometimes is not obvious and definitely time consuming.

Block allocation does not take into account heartbeats. As you know there are 
other mechanisms there, like DN load. Even if new replica is assigned to a node 
that has recently gone down, this will be detected during data transfer and a 
new location will be assigned.
Don't see how it should correlate with the delete policy.
                
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>
>                 Key: HDFS-3368
>                 URL: https://issues.apache.org/jira/browse/HDFS-3368
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: blockDeletePolicy-0.22.patch, 
> blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
>
>
> All replicas of a block can be removed if bad DataNodes come up and down 
> during cluster restart resulting in data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to