[
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250868#comment-13250868
]
Matteo Bertozzi commented on HBASE-5666:
----------------------------------------
The SOCKET_RETRY_WAIT_MS is 200ms but yes, is better sleeping with interrupt
since the code can accept interrupt. The only real difference is that you've to
wait the timeout if you want kill the inizialization.
The retry loop is tricky to understand since RecoverableZookeeper is used...
So if you give 0 as timeout, you're supposed to try once...
but recoverableZookeeper.exists() retries in case of CONNECTIONLOSS,
SESSIONEXPIRED and OPERATIONTIMEOUT.
The idea here is to retry for x millisec until znode become available while
(recoverableZookeeper.exists() == null)
If the client comes up during this time I think that should crash anyway
because the HRegion is still in the initialize() method...
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
> Key: HBASE-5666
> URL: https://issues.apache.org/jira/browse/HBASE-5666
> Project: HBase
> Issue Type: Bug
> Components: regionserver, zookeeper
> Affects Versions: 0.92.1, 0.94.0, 0.96.0
> Reporter: Matteo Bertozzi
> Assignee: Matteo Bertozzi
> Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch,
> HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch,
> HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log,
> hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log,
> hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed
> mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and
> HRegionServer.initializeZooKeeper() check just once if the base not is
> available.
> {code}
> 2012-03-28 21:54:05,013 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the
> one configured in the master.
> 2012-03-28 21:54:08,598 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> localhost,60202,1332964444824: Initialization of RS failed. Hence aborting
> RS.
> java.io.IOException: Received the shutdown message while waiting.
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> at java.lang.Thread.run(Thread.java:662)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira