[
https://issues.apache.org/jira/browse/HDFS-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453764#comment-13453764
]
Vinay commented on HDFS-3908:
-----------------------------
Hi Xiao,
Here is my understanding about the problem:
1. In Non-HA alternative editlogs missed in alternative name dirs, NameNode
would start by loading from available directories.
2. In HA case, if alternative editlogs are missed from shared storage and same
is available in local directories, why cant we load it and start it from
there..? Right..?
As of now, this is not supported. Because, NN not sure about all edit lots are
available in Local directories. So NameNode HA design specifies that Shared
storage should be HIGHLY AVAILABLE ( I mean without any dataloss ).
So if any one editlog is missed from shared storage, then it will not select
further editlogs for reading.
*Why Non-HA supports this scenario..?*
Since local edits can be in multiple directories. So corrupting of one
directory should not be a problem, so it will select and read the edits from
alternative directories.
*Why HA doesn't support..?*
Shared storage is the common storage for both ACTIVE and STANDBY namenodes,
all Local edits present in ACTIVE will differ from STANDBY's local edits. So
shared storage should contain all required editlog always. During STANDBY
state, NN will load only from shared storage, because it doesn't have complete
edits in local. During switch-over phase NN will try to load from both local
and shared storage.
In your case, since shared storage (BookKeeper) has missed the first editlog it
looking for, further editlogs are not selected for reading and local edits also
have some gaps, so NN got shutdown.
> In HA mode, when there is a ledger in BK missing, which is generated after
> the last checkpoint, NN can not restore it.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-3908
> URL: https://issues.apache.org/jira/browse/HDFS-3908
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 2.0.1-alpha
> Reporter: Han Xiao
>
> If not HA, when the num of edits.dir is larger than 1. Missing of one editlog
> file in a dir will not relust problem cause of the replica in the other dir.
> However, when in HA mode(using BK as ShareStorage), if an ledger missing, the
> missing ledger will not restored at the phase of NN starting even if the
> related editlog file existing in local dir.
> The missing maintains when NN is still in standby state. However, when the NN
> enters active state, it will read the editlog file(related to the missing
> ledger) in local. But, unfortunately, the ledger after the missing one in BK
> can't be readed at such a phase(cause of gap).
> Therefore in the following situation, editlogs will not be restored even
> there is an editlog file either in BK or in local dir:
> In such a stituation, editlog can't be restored:
> 1、fsiamge file: fsimage_0000000000000005946.md5
> 2、legder in zk:
> \[zk: localhost:2181(CONNECTED) 0\] ls
> /hdfsEdit/ledgers/edits_00000000000000594
> edits_000000000000005941_000000000000005942
> edits_000000000000005943_000000000000005944
> edits_000000000000005945_000000000000005946
> edits_000000000000005949_000000000000005949
> (missing edits_000000000000005947_000000000000005948)
> 3、editlog in local editlog dir:
> \-rw-r--r-- 1 root root 30 Sep 8 03:24
> edits_0000000000000005947-0000000000000005948
> \-rw-r--r-- 1 root root 1048576 Sep 8 03:35
> edits_0000000000000005950-0000000000000005950
> \-rw-r--r-- 1 root root 1048576 Sep 8 04:42
> edits_0000000000000005951-0000000000000005951
> (miss edits_0000000000000005949-0000000000000005919)
> 4、and the seen_txid
> vm2:/tmp/hadoop-root/dfs/name/current # cat seen_txid
> 5949
> Here, we want to restored editlog from txid 5946(image) to txid
> 5949(seen_txid). The 5947-5948 is missing in BK, 5949-5949 is missing in
> local dir.
> When start the NN, the following exception is thrown:
> 2012-09-08 06:26:10,031 FATAL
> org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring
> NN shutdown. Shutting down immediately.
> java.io.IOException: There appears to be a gap in the edit log. We expected
> txid 5949, but got txid 5950.
> at
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:163)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:692)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:223)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:182)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:599)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1325)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1233)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:990)
> at
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> 2012-09-08 06:26:10,036 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at vm2/160.161.0.155
> ************************************************************/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira