What I have seen so far is mostly related to init/sync limit together with
snapshot size. (ZOOKEEPER-1697, ZOOKEEPER-1521)

It might be possible that a client trying to reconnect cause a load spike
on the server and push the server over the limit,  but you will have to
have lots of clients in this case.

I think it will be easier to narrow down the problem by checking which
phase (e.g. Leader election or synchronization) the quorum fails


-- 
Thawan Kooburat





On 5/13/13 10:48 AM, "Marshall McMullen" <[email protected]>
wrote:

>I'm debugging a problem we're seeing where after quorum loss quorum does
>not recover as I expect it should. It seems that I've isolated the problem
>to quorum not be re-established if there are clients trying to connect to
>the ensemble at the same time that the nodes are coming up and trying to
>form quorum. Is there any known issue with this? I've searched for open
>Jiras without any luck.

Reply via email to