[
https://issues.apache.org/jira/browse/HBASE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431238#comment-13431238
]
nkeywal commented on HBASE-6523:
--------------------------------
I agree, Zookeeper list comes handy for these questions :-).
To me, to be validated by ZK experts, ConnectionLoss means that we lost the
connection, but we hope it will come back. When it comes back, we receive all
the events, and there should be no data loss. While for a SessionTimeout, we
may have lost events, so we should re-initiate the watchers and, from an
application point of view, take into account that we may have missed events in
the middle.
The way we manage session timeouts in HBase/RecoverableZK is tricky: we retry,
because we expect that a parallel abort will have triggered a zk session
recreation, so our next retry will be on a brand new ZK session (and ZooKeeper
object in the RecoverableZK ) and so it will work.
As we retry a limited amount of time in the RecovableZK, for connectionLoss we
may stop to retry before the timeout is happening, and throw the exception to
the calling layer. As such it may becoming a unrecovable error from an HBase
point of view. I think that if we want to fix this we should change
RecoverableZooKeeper to make it retry all the time for a connectionLoss,
waiting for the session timeout to occur. May be as well we have calls not
using the recovable ZK (if I'm remember well I've seen a few, and is was
justified I believe). But we should not re create a session for a connection
loss (it could have bad side effects with ZK having to manage too many
sessions, the old and the new, for example).
> HConnectionImplementation still does not recover from all ZK issues.
> --------------------------------------------------------------------
>
> Key: HBASE-6523
> URL: https://issues.apache.org/jira/browse/HBASE-6523
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.0
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Attachments: 6523.txt
>
>
> During some testing here at Salesforce.com we found another scenario where an
> HConnectionImplementation would never recover from a lost ZK connection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira