[
https://issues.apache.org/jira/browse/HDFS-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663967#comment-13663967
]
Han Xiao commented on HDFS-4154:
--------------------------------
Sorry for the format before.
Hi, Uma
There has been a testcase for the concurrentFormat which is
TestBookKeeperJournalManager.testConcurrentFormat.
In it's expected behavior, which catch the IOException as GoodExcepiton:
{code:java}
} catch (IOException ioe) {
LOG.info("Exception formatting ", ioe);
return ThreadStatus.GOODEXCEPTION;
{code}
However, in the starting of NN, the IOException will result in a failed
starting of nn. So the IOException should not be a GoodException.
I revised the testcase to consider IOException as also Bad, however, then after
patch applied testcase can't be passed(Aslo, before the patch it will be
failed). I find the problem comes from
{code:java}
// delete old info
if (zkc.exists(basePath, false) != null) {
if (zkc.exists(ledgerPath, false) != null) {
for (EditLogLedgerMetadata l : getLedgerList(true)) {
try {
bkc.deleteLedger(l.getLedgerId());
} catch (BKException.BKNoSuchLedgerExistsException bke) {
LOG.warn("Ledger " + l.getLedgerId() + " does not exist;"
+ " Cannot delete.");
}
}
}
ZKUtil.deleteRecursive(zkc, basePath);
}
{code}
Both bkc and zkutil may throw Exception in concurrent condition. The Revision
by resolving conflict is not suitalbe for them and also ugly.
Therefore, i want to use a zk-lock to resolving it throughly. What do you think?
> BKJM: Two namenodes usng bkjm can race to create the version znode
> ------------------------------------------------------------------
>
> Key: HDFS-4154
> URL: https://issues.apache.org/jira/browse/HDFS-4154
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: 3.0.0, 2.0.3-alpha
> Reporter: Ivan Kelly
> Assignee: Han Xiao
> Attachments: HDFS-4154.patch
>
>
> nd one will get the following error.
> 2012-11-06 10:04:00,200 INFO
> hidden.bkjournal.org.apache.zookeeper.ClientCnxn: Session establishment
> complete on server 109-231-69-172.flexiscale.com/109.231.69.172:2181,
> sessionid = 0x13ad528fcfe0005, negotiated timeout = 4000
> 2012-11-06 10:04:00,710 FATAL
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal,
> bookkeeper://109.231.69.172:2181;109.231.69.173:2181;109.231.69.174:2181/hdfsjournal
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1251)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initSharedJournalsForRead(FSEditLog.java:206)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:657)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:590)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:259)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:544)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:423)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:385)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:401)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:435)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:611)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:592)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1201)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1249)
> ... 14 more
> Caused by: java.io.IOException: Error initializing zk
> at
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.<init>(BookKeeperJournalManager.java:233)
> ... 19 more
> Caused by:
> hidden.bkjournal.org.apache.zookeeper.KeeperException$NodeExistsException:
> KeeperErrorCode = NodeExists for /hdfsjournal/version
> at
> hidden.bkjournal.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> at
> hidden.bkjournal.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at
> hidden.bkjournal.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:778)
> at
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.<init>(BookKeeperJournalManager.java:222)
> ... 19 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira