[
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245720#comment-13245720
]
Matteo Bertozzi commented on HBASE-5666:
----------------------------------------
Still looking at the 0.90 code...
The new ZooKeeperWatcher (>=0.92) calls the ZKUtil.createAndFailSilent(), to
create base node and others, only if called by HMaster (canCreateBaseZNode =
true), while before the code path was the same for everyone.
So now, if HMaster has not reached the "create base node" point, before the
HRegionServer checks the existence of base node... the region server crashes...
If we want to keep the previous logic, the first one that arrives create the
base node & co, we can remove the canCreateBaseZNode flag, else we can use
HBASE-5666-v4.patch to wait and retry on checkExists().
what do you think?
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
> Key: HBASE-5666
> URL: https://issues.apache.org/jira/browse/HBASE-5666
> Project: HBase
> Issue Type: Bug
> Components: regionserver, zookeeper
> Reporter: Matteo Bertozzi
> Assignee: Matteo Bertozzi
> Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch,
> HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log,
> hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log,
> hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed
> mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and
> HRegionServer.initializeZooKeeper() check just once if the base not is
> available.
> {code}
> 2012-03-28 21:54:05,013 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the
> one configured in the master.
> 2012-03-28 21:54:08,598 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> localhost,60202,1332964444824: Initialization of RS failed. Hence aborting
> RS.
> java.io.IOException: Received the shutdown message while waiting.
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> at java.lang.Thread.run(Thread.java:662)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira