[
https://issues.apache.org/jira/browse/HBASE-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891848#action_12891848
]
Benoit Sigoure commented on HBASE-2849:
---------------------------------------
http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/org/apache/zookeeper/ZooKeeper.html
bq. If for some reason, the client fails to send heart beats to the server for
a prolonged period of time (exceeding the sessionTimeout value, for instance),
the server will expire the session, and the session ID will become invalid. The
client object will no longer be usable. To make ZooKeeper API calls, the
application must create a new client object.
So apparently, a new {{ZooKeeper}} object must be created when the session
becomes invalid. This sounds like a bad API, not sure why they did it this
way. In HBase's source code, it seems that the only thing that creates a
{{ZooKeeper}} instance is in {{ZooKeeperWrapper#reconnectToZk}}. This method,
although it's public, is only called from 3 other methods in that class: the
constructor, {{exists}} and {{deleteUnassignedRegion}}. The latter,
{{deleteUnassignedRegion}}, is only used by the master. The former,
{{exists}}, is only called from the following locations:
* {{ZKUnassignedWatcher}}'s constructor. This is only used in the master.
* {{RSZookeeperUpdater#startRegionCloseEvent}}. This is only used in the
region server.
* {{ZooKeeperWrapper#createOrUpdateUnassignedRegion}}. This is only used by
the master's {{RegionManager}}.
* {{ZooKeeperWrapper#createUnassignedRegion}} and
{{ZooKeeperWrapper#updateUnassignedRegion}}. Those two methods, even though
they're public, are only called from
{{ZooKeeperWrapper#createOrUpdateUnassignedRegion}}, which itself is only used
by the master's {{RegionManager}}.
In other words, for someone writing an HBase application, only a single
{{ZooKeeper}} instance gets created when the {{ZooKeeperWrapper}} is
instantiated. Any failure that causes the client's session to become invalid
will is unrecoverable with the current code and the client has to be killed and
restarted.
Jonathan, is the work being done for the master rewrite branch going to address
this issue? Bear in mind that here I'm concerned about HBase *client*
applications.
> Clients stuck in loop doing "NIOServerCnxn: Closed socket connection"
> ---------------------------------------------------------------------
>
> Key: HBASE-2849
> URL: https://issues.apache.org/jira/browse/HBASE-2849
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Fix For: 0.90.0
>
>
> Someone made mention of this loop last week but I don't think I filed an
> issue. Here is another instance, again from a secret hbase admirer:
> "It seems that when Zookeeper dies and restarts, all client applications need
> to be restarted too. I just restarted HBase in non-distributed mode (which
> includes a ZK) and now my application can't reconnect to ZK unless I restart
> it too. I'm stuck in this loop:
> {code}
> 2010-07-19 00:13:05,725 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Closed socket connection for client /127.0.0.1:55153 (no session
> established for client)
> 2010-07-19 00:13:07,052 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Accepted socket connection from /127.0.0.1:55154
> 2010-07-19 00:13:07,053 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Refusing session request for client /127.0.0.1:55154 as it has seen zxid
> 0xf5 our last zxid is 0xd7
> client must try another server
> {code}
> "
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.