[ 
https://issues.apache.org/jira/browse/HDFS-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815501#comment-15815501
 ] 

Clay B. commented on HDFS-11304:
--------------------------------

Hi [~jojochuang] when I hit this it was not just a crash, it was a failure for 
the NN to start even though the broader cluster had all necessary data. (E.g. 
if a node had crashed due to hardware/OS failure and then tried to re-join the 
cluster to provide HA services, we had to take manual action to rsync the edit 
dirs.)

> Namenode fails to start, even edit log available in the journal node
> --------------------------------------------------------------------
>
>                 Key: HDFS-11304
>                 URL: https://issues.apache.org/jira/browse/HDFS-11304
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, journal-node
>    Affects Versions: 2.8.0, 2.7.1
>         Environment: *HDP 2.4.2.0-258*
>            Reporter: Karthik P
>            Assignee: Karthik P
>              Labels: patch
>
> JN => JournalNode
> NN => Namenode local directory (_dfs.namenode.name.dir_)
> Y/N => Is edit file/log present?
> Ex : edits_0000000000001627921-0000000000001627961
> *Scenario:*
> ||JN 1||JN 2||JN 3||NN local|| Is NN started?
> |N|N|Y|N|Started|   
> |Y|N|N|N|Started|
> |N|Y|N|N|Failed|
> |N|Y|N|Y|Started|
> |Y|Y|N|N|Started| 
> *Note:* Namenode and JN2 installed on the same machine
> *Trace :*
>  ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start 
> namenode.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid 1627921, but got txid 1627962.
>       at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:837)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:692)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:983)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:688)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:662)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:726)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:951)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:935)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1641)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1707)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to