[ 
https://issues.apache.org/jira/browse/HBASE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856719#action_12856719
 ] 

Todd Lipcon commented on HBASE-2441:
------------------------------------

I think I caused this by starting a RS while the master was down, and then 
killing ZK. First got the NPE because metrics wasn't initialized yet when 
abort() came:
{code}
2010-04-13 17:40:28,495 ERROR org.apache.zookeeper.ClientCnxn: Error while 
calling watcher 
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1263)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:373)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
{code}
and then looped forever with:
{code}
2010-04-13 18:00:19,158 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: Start code already taken, 
trying another one
2010-04-13 18:00:19,158 WARN 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase/rs 
-- check quorum servers, currently=monster01.sf.cloudera.com:2222
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase/rs
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:405)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeRSLocation(ZooKeeperWrapper.java:586)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1339)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:428)
        at java.lang.Thread.run(Thread.java:619)
{code}

> ZK failures early in RS startup sequence cause infinite busy loop
> -----------------------------------------------------------------
>
>                 Key: HBASE-2441
>                 URL: https://issues.apache.org/jira/browse/HBASE-2441
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> If the RS loses its ZK session before it reports for duty, the abort() call 
> will trigger an NPE, and then the stop boolean doesn't get toggled. The RS 
> will then loop forever trying to register itself in the expired ZK session, 
> and fill up the logs.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to