[
https://issues.apache.org/jira/browse/HBASE-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706717#action_12706717
]
ryan rawson commented on HBASE-1384:
------------------------------------
then after i do a stop-hbase.sh / start-hbase.sh the client says:
2009-05-06 21:09:33,736 WARN
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on
ZNode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:346)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:342)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
I thought we fixed the case where the ZK session expires? We're still
suffering because of this bug.
> when a regionserver loses it's ZK connection, it becomes permanently hosed
> --------------------------------------------------------------------------
>
> Key: HBASE-1384
> URL: https://issues.apache.org/jira/browse/HBASE-1384
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: Nitay Joffe
> Fix For: 0.20.0
>
>
> Some regionservers lost their ZK connection (timed out) then this happened:
> 2009-05-06 21:09:31,558 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x1210ac3ab1400e1 to sun.nio.ch.selectionkeyi...@736921fd
> java.io.IOException: TIMED OUT
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
> 2009-05-06 21:09:31,558 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message
> (Retry: 0)
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:539)
> at
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
> at
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
> at
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:496)
> at java.lang.Thread.run(Thread.java:717)
> At this point, the regionserver has been hosed for over an hour, and shows no
> signs of returning.
> Of my 19 regionservers, 15 are left, and when i ls /hbase/rs I only see 15
> ephermeral nodes.
> But the master isn't giving it up and refuses to let the regionservers rejoin
> the cluster.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.