[
https://issues.apache.org/jira/browse/HDFS-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064893#comment-14064893
]
Rakesh R commented on HDFS-4154:
--------------------------------
Hi [~yians], Yeah I got your point. In this case we need to have zk-lock
mechanism to fully isolate the format(). Otherwise there could be different
scenarios it can hit exceptions.
But I have a different thought after seeing BOOTSTRAPSTANDBY option. IMHO it is
not required to add extra logic to handle the concurrency between two FORMAT
calls, if user can ensure that they will call FORMAT only on one NN server and
other server will start the NN server with BOOTSTRAPSTANDBY option. Whats your
opinion on this?
> BKJM: Two namenodes usng bkjm can race to create the version znode
> ------------------------------------------------------------------
>
> Key: HDFS-4154
> URL: https://issues.apache.org/jira/browse/HDFS-4154
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: 3.0.0, 2.0.3-alpha
> Reporter: Ivan Kelly
> Assignee: Han Xiao
> Attachments: HDFS-4154.patch
>
>
> nd one will get the following error.
> 2012-11-06 10:04:00,200 INFO
> hidden.bkjournal.org.apache.zookeeper.ClientCnxn: Session establishment
> complete on server 109-231-69-172.flexiscale.com/109.231.69.172:2181,
> sessionid = 0x13ad528fcfe0005, negotiated timeout = 4000
> 2012-11-06 10:04:00,710 FATAL
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal,
> bookkeeper://109.231.69.172:2181;109.231.69.173:2181;109.231.69.174:2181/hdfsjournal
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1251)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initSharedJournalsForRead(FSEditLog.java:206)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:657)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:590)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:259)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:544)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:423)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:385)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:401)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:435)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:611)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:592)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1201)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1249)
> ... 14 more
> Caused by: java.io.IOException: Error initializing zk
> at
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.<init>(BookKeeperJournalManager.java:233)
> ... 19 more
> Caused by:
> hidden.bkjournal.org.apache.zookeeper.KeeperException$NodeExistsException:
> KeeperErrorCode = NodeExists for /hdfsjournal/version
> at
> hidden.bkjournal.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> at
> hidden.bkjournal.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at
> hidden.bkjournal.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:778)
> at
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.<init>(BookKeeperJournalManager.java:222)
> ... 19 more
--
This message was sent by Atlassian JIRA
(v6.2#6252)