[
https://issues.apache.org/jira/browse/HDFS-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831826#comment-15831826
]
Greg Senia commented on HDFS-11304:
-----------------------------------
We saw this again last night with HDP 2.5.3.0 and Ambari 2.4.2.0 I think this
revolves around clusters with things that are not seeing safemode occur??? This
seems to occur when HDFS is restarted but rest of the services are left up.
Data that was in the missing/uncomplete edits file:
[root@ha21t52nn hdfs]# strings
/hadoop/hdfs/journal/tech/current/edits_inprogress_0000000000008318735 | less
#DFSClient_NONMAPREDUCE_1766868484_1
10.70.33.1
{gs+
4/spark-history/.c33046a8-d2f4-4980-87ac-7cf22922a4e4
spark
hadoop
4/spark-history/.c33046a8-d2f4-4980-87ac-7cf22922a4e4
{gs+
4/spark-history/.baf302ac-c95e-49c5-92a1-db88beedfafb
spark
hadoop
#DFSClient_NONMAPREDUCE_1766868484_1
10.70.33.1
{gs+
4/spark-history/.baf302ac-c95e-49c5-92a1-db88beedfafb
spark
hadoop
4/spark-history/.baf302ac-c95e-49c5-92a1-db88beedfafb
{gs+
4/spark-history/.75dfd3b5-aa25-46d4-8f47-e8ba05be92b8
spark
hadoop
#DFSClient_NONMAPREDUCE_1766868484_1
10.70.33.1
{gs+
4/spark-history/.75dfd3b5-aa25-46d4-8f47-e8ba05be92b8
spark
hadoop
4/spark-history/.75dfd3b5-aa25-46d4-8f47-e8ba05be92b8
{gs+
4/spark-history/.454321c2-0f1f-4aac-9fde-3dacb06d9088
spark
hadoop
#DFSClient_NONMAPREDUCE_1766868484_1
10.70.33.1
{gs+
4/spark-history/.454321c2-0f1f-4aac-9fde-3dacb06d9088
spark
hadoop
4/spark-history/.454321c2-0f1f-4aac-9fde-3dacb06d9088
{gs+
4/spark-history/.cb32576d-b241-47f0-9206-4cfdde2daf82
spark
hadoop
#DFSClient_NONMAPREDUCE_1766868484_1
10.70.33.1
{gs+
4/spark-history/.cb32576d-b241-47f0-9206-4cfdde2daf82
spark
hadoop
4/spark-history/.cb32576d-b241-47f0-9206-4cfdde2daf82
{gs+
2017-01-19 20:23:03,720 ERROR namenode.NameNode
(NameNode.java:doImmediateShutdown(1891)) - Error encountered requiring NN
shutdown. Shutting down immediately.
java.io.IOException: There appears to be a gap in the edit log. We expected
txid 8318735, but got txid 8318778.
at
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:843)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:188)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:182)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1740)
at
org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:509)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:490)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:182)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1143)
at
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1915)
at
org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1783)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1631)
at
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1740)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2309)
2017-01-19 20:23:03,721 INFO util.ExitUtil (ExitUtil.java:terminate(124)) -
Exiting with status 1
> Namenode fails to start, even edit log available in the journal node
> --------------------------------------------------------------------
>
> Key: HDFS-11304
> URL: https://issues.apache.org/jira/browse/HDFS-11304
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs, journal-node
> Affects Versions: 2.8.0, 2.7.1
> Environment: *HDP 2.4.2.0-258*
> Reporter: Karthik P
> Assignee: Karthik P
> Labels: patch
>
> JN => JournalNode
> NN => Namenode local directory (_dfs.namenode.name.dir_)
> Y/N => Is edit file/log present?
> Ex : edits_0000000000001627921-0000000000001627961
> *Scenario:*
> ||JN 1||JN 2||JN 3||NN local|| Is NN started?
> |N|N|Y|N|Started|
> |Y|N|N|N|Started|
> |N|Y|N|N|Failed|
> |N|Y|N|Y|Started|
> |Y|Y|N|N|Started|
> *Note:* Namenode and JN2 installed on the same machine
> *Trace :*
> ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start
> namenode.
> java.io.IOException: There appears to be a gap in the edit log. We expected
> txid 1627921, but got txid 1627962.
> at
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:837)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:692)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:983)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:688)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:662)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:726)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:951)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:935)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1641)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1707)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]