[
https://issues.apache.org/jira/browse/HDFS-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829861#comment-13829861
]
Fengdong Yu commented on HDFS-5553:
-----------------------------------
{code}
2013-11-22 18:12:53,460 INFO
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding
stream
'http://test2.com:8480/getJournal?jid=test-cluster&segmentTxId=39&storageInfo=-48%3A412886569%3A1385114618309%3ACID-d359fe59-5e40-41b3-bc18-8595bcc5af07,
http://test1.com:8480/getJournal?jid=test-cluster&segmentTxId=39&storageInfo=-48%3A412886569%3A1385114618309%3ACID-d359fe59-5e40-41b3-bc18-8595bcc5af07,
http://test3.com:8480/getJournal?jid=test-cluster&segmentTxId=39&storageInfo=-48%3A412886569%3A1385114618309%3ACID-d359fe59-5e40-41b3-bc18-8595bcc5af07'
to transaction ID 38
2013-11-22 18:12:53,460 INFO
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding
stream
'http://test2.com:8480/getJournal?jid=test-cluster&segmentTxId=39&storageInfo=-48%3A412886569%3A1385114618309%3ACID-d359fe59-5e40-41b3-bc18-8595bcc5af07'
to transaction ID 38
2013-11-22 18:12:53,771 INFO org.mortbay.log: Stopped
[email protected]:50070
2013-11-22 18:12:53,872 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Stopping NameNode metrics system...
2013-11-22 18:12:53,873 INFO
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: ganglia thread interrupted.
2013-11-22 18:12:53,873 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NameNode metrics system stopped.
2013-11-22 18:12:53,874 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NameNode metrics system shutdown complete.
2013-11-22 18:12:53,875 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:
Exception in namenode join
java.io.IOException: There appears to be a gap in the edit log. We expected
txid 38, but got txid 39.
at
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:189)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:117)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:730)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:644)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:261)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:859)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:621)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:692)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:677)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1283)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1349)
2013-11-22 18:12:53,880 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
2013-11-22 18:12:53,883 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
SHUTDOWN_MSG:
{code}
> SNN crashed because edit log has gap after upgrade
> --------------------------------------------------
>
> Key: HDFS-5553
> URL: https://issues.apache.org/jira/browse/HDFS-5553
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, hdfs-client
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Fengdong Yu
> Priority: Blocker
>
> As HDFS-5550 depicted, journal nodes doesn't upgrade, so I change the VERSION
> manually according to NN's VERSION.
> then , I do upgrade and get this exception.
> my steps as following:
> It's a fresh cluster with hadoop-2.0.1 before upgrading.
> 0) install hadoop-2.2.0 hadoop package on all nodes.
> 1) stop-dfs.sh on active NN
> 2) disable HA in the core-site.xml and hdfs-site.xml on active NN and SNN
> 3) start-dfs.sh -upgrade -clusterId test-cluster on active NN(only one NN
> now.)
> 4) stop-dfs.sh after active NN started successfully.
> 5) change all journal nodes' VERSION manually according to NN's VERSION
> 6) rm -f 'dfs.journalnode.edits.dir'/test-cluster/current/* (just keep
> VERSION here)
> 7) delete all data under 'dfs.namenode.name.dir' on SNN
> 8) scp -r 'dfs.namenode.name.dir' to SNN on active NN
> 9) start-dfs.sh
--
This message was sent by Atlassian JIRA
(v6.1#6144)