[jira] [Created] (HDFS-6964) NN fails to fix under replication leading to data loss

Daryn Sharp (JIRA) Thu, 28 Aug 2014 10:28:35 -0700

Daryn Sharp created HDFS-6964:
---------------------------------

             Summary: NN fails to fix under replication leading to data loss
                 Key: HDFS-6964
                 URL: https://issues.apache.org/jira/browse/HDFS-6964
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.0.0-alpha, 3.0.0
            Reporter: Daryn Sharp
            Priority: Blocker



We've encountered lost blocks due to node failure even when there is ample time 
to fix the under-replication.

2 nodes were lost.  The 3rd node with the last remaining replicas averaged 1 
copy block per heartbeat (3s) until ~7h later when that node was lost resulting 
in over 50 lost blocks.  When the node was restarted and sent its BR the NN 
immediately began fixing the replication.

In another data loss event, over 150 blocks were lost due to node failure but 
the timing of the node loss is not known so there may have been inadequate time 
to fix the under-replication unlike the first case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6964) NN fails to fix under replication leading to data loss

Reply via email to