[
https://issues.apache.org/jira/browse/HBASE-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benoit Sigoure updated HBASE-2849:
----------------------------------
Attachment: 0001-HBASE-2849-Have-HBase-clients-recover-from-ZooKeeper.patch
Patch that fixes the issue. Actually there was some logic I didn't notice
earlier in {{HConnectionManager}} to attempt to deal with ZK failures and
reconnect when needed, but the code wasn't doing the right thing and didn't
work when there was a disconnection between the HBase client and the ZK quorum.
So the patch is rather simple and consists in fixing the existing logic in
{{HConnectionManager.ClientZKWatcher}}.
I tested this by starting a long running HBase application, killing the whole
ZooKeeper ensemble and restarting it. The application experiences a hiccup
while ZK is unavailable and is able to recover automatically soon after the ZK
quorum is back online. Someone else is more than welcome to write a unit test
that simulates this scenario if they feel like it.
> HBase clients cannot recover when their ZooKeeper session becomes invalid
> -------------------------------------------------------------------------
>
> Key: HBASE-2849
> URL: https://issues.apache.org/jira/browse/HBASE-2849
> Project: HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.89.20100621
> Reporter: stack
> Assignee: Benoit Sigoure
> Priority: Critical
> Fix For: 0.90.0
>
> Attachments:
> 0001-HBASE-2849-Have-HBase-clients-recover-from-ZooKeeper.patch
>
>
> Someone made mention of this loop last week but I don't think I filed an
> issue. Here is another instance, again from a secret hbase admirer:
> "It seems that when Zookeeper dies and restarts, all client applications need
> to be restarted too. I just restarted HBase in non-distributed mode (which
> includes a ZK) and now my application can't reconnect to ZK unless I restart
> it too. I'm stuck in this loop:
> {code}
> 2010-07-19 00:13:05,725 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Closed socket connection for client /127.0.0.1:55153 (no session
> established for client)
> 2010-07-19 00:13:07,052 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Accepted socket connection from /127.0.0.1:55154
> 2010-07-19 00:13:07,053 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Refusing session request for client /127.0.0.1:55154 as it has seen zxid
> 0xf5 our last zxid is 0xd7
> client must try another server
> {code}
> "
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.