chuanjie.duan created HDFS-16349:
------------------------------------
Summary: FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
Key: HDFS-16349
URL: https://issues.apache.org/jira/browse/HDFS-16349
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs
Affects Versions: 3.2.2
Reporter: chuanjie.duan
2021-11-22 20:36:44,440 INFO
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest log:
10.65.57.133:8485=segmentState {
startTxId: 3906965
endTxId: 3906965
isInProgress: false
}
lastWriterEpoch: 5
lastCommittedTxId: 3906964
2021-11-22 20:36:44,457 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering
unfinalized segments in /data12/data/flashHadoopU/namenode/current
2021-11-22 20:36:44,495 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file
/data12/data/flashHadoopU/namenode/current/edits_inprogress_0000000000003898378
->
/data12/data/flashHadoopU/namenode/current/edits_0000000000003898378-0000000000003898412
2021-11-22 20:36:44,657 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception
loading fsimage
java.io.IOException: Gap in transactions. Expected to be able to read up until
at least txid 2510934 but unable to find any edit logs containing txid 2510933
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped
HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070
2021-11-22 20:36:44,760 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Stopping NameNode metrics system...
2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NameNode metrics system stopped.
2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NameNode metrics system shutdown complete.
2021-11-22 20:36:44,761 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode:
Failed to start namenode.
Old version: 2.7.3
New version: 3.2.2
Steps to Reproduce
Step 1: Start NN1 as active , NN2 as standby .
Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare"
Step 3: Start NN2 active and NN1 as standby with rolling upgrade started option.
Step 4: DN also restarted in upgrade mode.
Step 5: Restart journalnode with new hadoop version
Step 6: a few days later
Step 7: bring down both NN, journalnode, DN
Step 8: Start JN with old version
Step 9: Start NN1 with rolling upgrade rollback option. nn started failed with
above ERROR(Above mentioned txid version 2510933 has been deleted because of
checkpoint mechanism)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]