[
https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994966#comment-12994966
]
ryan rawson commented on HBASE-3065:
------------------------------------
can you check your ZK cluster health? There is a link at the top of the master
page called 'zk dump'.
We had a situation where 2/5 of our quorum members were not part of it, and you
get error messages like that a lot. We changed the logging so it might be
illustrating a deployment issue on your end.
> Retry all 'retryable' zk operations; e.g. connection loss
> ---------------------------------------------------------
>
> Key: HBASE-3065
> URL: https://issues.apache.org/jira/browse/HBASE-3065
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Fix For: 0.92.0
>
>
> The 'new' master refactored our zk code tidying up all zk accesses and
> coralling them behind nice zk utility classes. One improvement was letting
> out all KeeperExceptions letting the client deal. Thats good generally
> because in old days, we'd suppress important state zk changes in state. But
> there is at least one case the new zk utility could handle for the
> application and thats the class of retryable KeeperExceptions. The one that
> comes to mind is conection loss. On connection loss we should retry the
> just-failed operation. Usually the retry will just work. At worse, on
> reconnect, we'll pick up the expired session event.
> Adding in this change shouldn't be too bad given the refactor of zk corralled
> all zk access into one or two classes only.
> One thing to consider though is how much we should retry. We could retry on
> a timer or we could retry for ever as long as the Stoppable interface is
> passed so if another thread has stopped or aborted the hosting service, we'll
> notice and give up trying. Doing the latter is probably better than some
> kinda timeout.
> HBASE-3062 adds a timed retry on the first zk operation. This issue is about
> generalizing what is over there across all zk access.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira