[
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363414#comment-14363414
]
ASF GitHub Bot commented on AMQ-5082:
-------------------------------------
GitHub user jimrobinson opened a pull request:
https://github.com/apache/activemq/pull/74
AMQ-5082 unit test and patch
A proposed unit test and patch to address the issues raised in AMQ-5082
(the quorum never recovered after a broker's zookeeper session expires)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jimrobinson/activemq master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/activemq/pull/74.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #74
----
commit 58b7198880f5296af6b2e4e9efbbdfdb51220411
Author: James A. Robinson <[email protected]>
Date: 2015-03-06T23:22:46Z
unit test for AMQ-5082
commit d272a116ff5c0916a6044d657f99df48f264bd2a
Author: James A. Robinson <[email protected]>
Date: 2015-03-11T20:47:14Z
recompute ZooKeeperTreeTracker on reconnect
commit 8e5558c731fe0ddeee4136d806315023c47f108c
Author: James A. Robinson <[email protected]>
Date: 2015-03-16T16:10:24Z
synchronize the entire tree check/rebuild operation; use a boolean instead
of a counter to track reconnected state
----
> ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
> -------------------------------------------------------------------
>
> Key: AMQ-5082
> URL: https://issues.apache.org/jira/browse/AMQ-5082
> Project: ActiveMQ
> Issue Type: Bug
> Components: activemq-leveldb-store
> Affects Versions: 5.9.0, 5.10.0
> Reporter: Scott Feldstein
> Priority: Critical
> Attachments: 03-07.tgz, amq_5082_threads.tar.gz,
> mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure,
> zookeeper.out-cluster.failure
>
>
> I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB
> persistence adapter.
> {code}
> <persistenceAdapter>
> <replicatedLevelDB
> directory="${activemq.data}/leveldb"
> replicas="3"
> bind="tcp://0.0.0.0:0"
> zkAddress="zookeep0:2181"
> zkPath="/activemq/leveldb-stores"/>
> </persistenceAdapter>
> {code}
> After about a day or so of sitting idle there are cascading failures and the
> cluster completely stops listening all together.
> I can reproduce this consistently on 5.9 and the latest 5.10 (commit
> 2360fb859694bacac1e48092e53a56b388e1d2f0). I am going to attach logs from
> the three mq nodes and the zookeeper logs that reflect the time where the
> cluster starts having issues.
> The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
> The OSs are all centos 5.9 on one esx server, so I doubt networking is an
> issue.
> If you need more data it should be pretty easy to get whatever is needed
> since it is consistently reproducible.
> This bug may be related to AMQ-5026, but looks different enough to file a
> separate issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)