[ https://issues.apache.org/jira/browse/HDFS-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HDFS-1382. ------------------------------------ Resolution: Fixed > A transient failure with edits log and a corrupted fstime together could lead > to a data loss > -------------------------------------------------------------------------------------------- > > Key: HDFS-1382 > URL: https://issues.apache.org/jira/browse/HDFS-1382 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Thanh Do > > We experienced a data loss situation that due to double failures. > One is transient disk failure with edits logs and the other is corrupted > fstime. > > Here is the detail: > > 1. NameNode has 2 edits directory (say edit0 and edit1) > > 2. During an update to edit0, there is a transient disk failure, > making NameNode bump the fstime and mark edit0 as stale > and continue working with edit1. > > 3. NameNode is shut down. Now, and unluckily fstime in edit0 > is corrupted. Hence during NameNode startup, the log in edit0 > is replayed, hence data loss. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message was sent by Atlassian JIRA (v6.2#6252)