[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813548#comment-16813548
 ] 

Daryn Sharp commented on HDFS-13596:
------------------------------------

I'm really glad this has gained traction!  Otherwise 3.x hdfs was DOA...
{quote}  New properties will be ignored as I checked the code. There ls a 
length field in every edit log.
{quote}
I can't remember exact details but when I tried to take advantage of this 
behavior in ~2.6-2.7 it was completely broken so might double check.
 # The check for EC support should be in {{FSNamesystem}} methods, not 
{{NameNodeRpcServer}}, since there can be multiple entry points to the 
namesystem like webhdfs.
 # {{DFSUtil.isSupportedErasureCoding}} probably doesn't belong in {{DFSUtil}} 
since it's not something that should called outside of the NN.
 # In {{FSEditLogOp}}, please call the former method instead of duplicating the 
logic.
 # Super trivial, might rename new {{layoutVersion}} parameter in the write 
methods to {{logVersion}} to be consistent with the signatures for the read 
methods.
 # How about a {{FSNamesystem.checkErasureCodingSupported(String op)}} to avoid 
all the redundant check/throw code in the methods?
 # A test case is needed to prove the edits are correctly read/written.

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -----------------------------------------------------
>
>                 Key: HDFS-13596
>                 URL: https://issues.apache.org/jira/browse/HDFS-13596
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Hanisha Koneru
>            Assignee: Fei Hui
>            Priority: Critical
>         Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.<init>(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.<init>(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.<init>(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> Caused by: java.lang.IllegalStateException: Cannot skip to less than the 
> current value (=16389), where newValue=16388
>  at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1943)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to