[ https://issues.apache.org/jira/browse/HDFS-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brahma Reddy Battula resolved HDFS-3908. ---------------------------------------- Resolution: Won't Fix Closing as wn't fix as per discussion [here|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201607.mbox/%3c1288296033.5942327.1469727453010.javamail.ya...@mail.yahoo.com%3E] and even not require after HDFS-10957 > In HA mode, when there is a ledger in BK missing, which is generated after > the last checkpoint, NN can not restore it. > ---------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-3908 > URL: https://issues.apache.org/jira/browse/HDFS-3908 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Affects Versions: 2.0.1-alpha > Reporter: Han Xiao > > If not HA, when the num of edits.dir is larger than 1. Missing of one editlog > file in a dir will not relust problem cause of the replica in the other dir. > However, when in HA mode(using BK as ShareStorage), if an ledger missing, the > missing ledger will not restored at the phase of NN starting even if the > related editlog file existing in local dir. > The missing maintains when NN is still in standby state. However, when the NN > enters active state, it will read the editlog file(related to the missing > ledger) in local. But, unfortunately, the ledger after the missing one in BK > can't be readed at such a phase(cause of gap). > Therefore in the following situation, editlogs will not be restored even > there is an editlog file either in BK or in local dir: > In such a stituation, editlog can't be restored: > 1、fsiamge file: fsimage_0000000000000005946.md5 > 2、legder in zk: > \[zk: localhost:2181(CONNECTED) 0\] ls > /hdfsEdit/ledgers/edits_00000000000000594 > edits_000000000000005941_000000000000005942 > edits_000000000000005943_000000000000005944 > edits_000000000000005945_000000000000005946 > edits_000000000000005949_000000000000005949 > (missing edits_000000000000005947_000000000000005948) > 3、editlog in local editlog dir: > \-rw-r--r-- 1 root root 30 Sep 8 03:24 > edits_0000000000000005947-0000000000000005948 > \-rw-r--r-- 1 root root 1048576 Sep 8 03:35 > edits_0000000000000005950-0000000000000005950 > \-rw-r--r-- 1 root root 1048576 Sep 8 04:42 > edits_0000000000000005951-0000000000000005951 > (miss edits_0000000000000005949-0000000000000005919) > 4、and the seen_txid > vm2:/tmp/hadoop-root/dfs/name/current # cat seen_txid > 5949 > Here, we want to restored editlog from txid 5946(image) to txid > 5949(seen_txid). The 5947-5948 is missing in BK, 5949-5949 is missing in > local dir. > When start the NN, the following exception is thrown: > 2012-09-08 06:26:10,031 FATAL > org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring > NN shutdown. Shutting down immediately. > java.io.IOException: There appears to be a gap in the edit log. We expected > txid 5949, but got txid 5950. > at > org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:163) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:692) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:223) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:182) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:599) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1325) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1233) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:990) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) > at > org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) > 2012-09-08 06:26:10,036 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NameNode at vm2/160.161.0.155 > ************************************************************/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org