[
https://issues.apache.org/jira/browse/HBASE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856719#action_12856719
]
Todd Lipcon commented on HBASE-2441:
------------------------------------
I think I caused this by starting a RS while the master was down, and then
killing ZK. First got the NPE because metrics wasn't initialized yet when
abort() came:
{code}
2010-04-13 17:40:28,495 ERROR org.apache.zookeeper.ClientCnxn: Error while
calling watcher
java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1263)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:373)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
{code}
and then looped forever with:
{code}
2010-04-13 18:00:19,158 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: Start code already taken,
trying another one
2010-04-13 18:00:19,158 WARN
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase/rs
-- check quorum servers, currently=monster01.sf.cloudera.com:2222
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /hbase/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:405)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeRSLocation(ZooKeeperWrapper.java:586)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1339)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:428)
at java.lang.Thread.run(Thread.java:619)
{code}
> ZK failures early in RS startup sequence cause infinite busy loop
> -----------------------------------------------------------------
>
> Key: HBASE-2441
> URL: https://issues.apache.org/jira/browse/HBASE-2441
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.3
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> If the RS loses its ZK session before it reports for duty, the abort() call
> will trigger an NPE, and then the stop boolean doesn't get toggled. The RS
> will then loop forever trying to register itself in the expired ZK session,
> and fill up the logs.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira