Hanisha Koneru created HDFS-13596:
-------------------------------------
Summary: NN restart fails after RollingUpgrade from 2.x to 3.x
Key: HDFS-13596
URL: https://issues.apache.org/jira/browse/HDFS-13596
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs
Reporter: Hanisha Koneru
After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails
while replaying edit logs.
* After NN is started with rollingUpgrade, the layoutVersion written to
editLogs (before finalizing the upgrade) is the pre-upgrade layout version (so
as to support downgrade).
* When writing transactions to log, NN writes as per the current layout
version. In 3.x, erasureCoding bits are added to the editLog transactions.
* So any edit log written after the upgrade and before finalizing the upgrade
will have the old layout version but the new format of transactions.
* When NN is restarted and the edit logs are replayed, the NN reads the old
layout version from the editLog file. When parsing the transactions, it assumes
that the transactions are also from the previous layout and hence skips parsing
the erasureCoding bits.
* This cascades into reading the wrong set of bits for other fields and leads
to NN shutting down.
Sample error output:
{code:java}
java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected
length 16
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.hadoop.ipc.RetryCache$CacheEntry.<init>(RetryCache.java:74)
at org.apache.hadoop.ipc.RetryCache$CacheEntry.<init>(RetryCache.java:86)
at
org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.<init>(RetryCache.java:163)
at
org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
2018-05-17 19:10:06,522 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception
loading fsimage
java.io.IOException: java.lang.IllegalStateException: Cannot skip to less than
the current value (=16389), where newValue=16388
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
Caused by: java.lang.IllegalStateException: Cannot skip to less than the
current value (=16389), where newValue=16388
at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1943)
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]