Todd Lipcon created HDFS-3967:
---------------------------------

             Summary: NN should bail our earlier when logs to load have a gap
                 Key: HDFS-3967
                 URL: https://issues.apache.org/jira/browse/HDFS-3967
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: name-node
    Affects Versions: 2.0.1-alpha, 3.0.0
            Reporter: Todd Lipcon
            Priority: Minor


i was testing an HA setup with a lowered edit log retention period, and ended 
up in a state where one of the two NNs had fallen too far behind, such that it 
couldn't start up again (due to the too-low retention period). When I started 
the NN, I got the following:

12/09/21 13:03:20 INFO namenode.FSImage: Loaded image for txid 45781083 from 
/tmp/name1-name/current/fsimage_0000000000045781083
12/09/21 13:03:20 INFO namenode.FSImage: Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@239a0feb 
expecting start txid #45781084
12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 
'http://localhost:13081/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b,
 
http://localhost:13082/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b,
 
http://localhost:13083/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b'
 to transaction ID 45781084
12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 
'http://localhost:13081/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b'
 to transaction ID 45781084
12/09/21 13:03:20 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: There appears to be a gap in the edit log.  We expected 
txid 45781084, but got txid 45928954.

Rather than trying to 'fast forward' the stream to a transaction which is 
actually prior to the first tx, we should bail earlier with a nicer error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to