[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589557#comment-16589557
 ] 

Rajesh Chandramohan commented on HDFS-13596:
--------------------------------------------

While  Upgrading from hadoop-2.7.3 to Hadoop-3.1.0

JournalNodes' s are all running with hadoop-3.X

 RollingUpgrade of one of the NN loaded fsimage and editLogs, it just stuck  on 
report Block , without any errors. All Datanodes are still at hadoop-2.7.  
block reporting doesn't progress anybody faced ?

++
{code:java}
The reported blocks 0 needs additional 204812  blocks to reach the threshold 
1.0000 of total blocks 204812. The number of live datanodes 11 has reached the 
minimum number 0. {code}
++

 

DN Logs 
{code:java}
2018-08-22 16:11:44,748 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeCommand action : DNA_REGISTER from -nn.node.com./X.X:8030 with standby 
state
2018-08-22 16:11:44,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Reported NameNode version '3.1.0' does not match DataNode version '2.7.1-Prod' 
but is within acceptable limits. Note: This is normal during a rolling upgrade.
2018-08-22 16:11:44,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Block pool BP-1018191021-10.115.22.28-1436474857708 (Datanode Uuid 
3057a76f-b274-492c-a774-df767a260f09) service to nn.node.com/XX.XX:8030 
beginning handshake with NN
2018-08-22 16:11:44,788 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Block pool Block pool BP-1018191021-10.115.22.28-1436474857708 (Datanode Uuid 
3057a76f-b274-492c-a774-df767a260f09) service to nn.node.com/XX.XX:8030 
successfully registered with NN{code}

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -----------------------------------------------------
>
>                 Key: HDFS-13596
>                 URL: https://issues.apache.org/jira/browse/HDFS-13596
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Hanisha Koneru
>            Assignee: Zsolt Venczel
>            Priority: Blocker
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.<init>(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.<init>(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.<init>(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> Caused by: java.lang.IllegalStateException: Cannot skip to less than the 
> current value (=16389), where newValue=16388
>  at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1943)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to