Todd Lipcon created HDFS-3967: --------------------------------- Summary: NN should bail our earlier when logs to load have a gap Key: HDFS-3967 URL: https://issues.apache.org/jira/browse/HDFS-3967 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.1-alpha, 3.0.0 Reporter: Todd Lipcon Priority: Minor
i was testing an HA setup with a lowered edit log retention period, and ended up in a state where one of the two NNs had fallen too far behind, such that it couldn't start up again (due to the too-low retention period). When I started the NN, I got the following: 12/09/21 13:03:20 INFO namenode.FSImage: Loaded image for txid 45781083 from /tmp/name1-name/current/fsimage_0000000000045781083 12/09/21 13:03:20 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@239a0feb expecting start txid #45781084 12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://localhost:13081/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b, http://localhost:13082/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b, http://localhost:13083/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b' to transaction ID 45781084 12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://localhost:13081/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b' to transaction ID 45781084 12/09/21 13:03:20 FATAL namenode.NameNode: Exception in namenode join java.io.IOException: There appears to be a gap in the edit log. We expected txid 45781084, but got txid 45928954. Rather than trying to 'fast forward' the stream to a transaction which is actually prior to the first tx, we should bail earlier with a nicer error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira