[
https://issues.apache.org/jira/browse/AMQ-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652207#comment-14652207
]
John Lindwall commented on AMQ-5897:
------------------------------------
The use of a network seems to be the root cause of the lost messages during
failover, so we are abandoning the use of a network. Instead we will use
Master/Slave shared persistence (a single master broker alive at any one time,
with N slaves ready to take over in the event of a master failure).
I had hoped to use JDBC persistence but it did not perform well enough for us
under a reasonable load (6+ secs on average to deliver msgs) so we are instead
using kahadb which is much faster of course.
We will have hundreds, if not thousands of connections to the single master; I
am hoping that will not be an issue.
> Slave fails to deliver messages
> -------------------------------
>
> Key: AMQ-5897
> URL: https://issues.apache.org/jira/browse/AMQ-5897
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.10.0, 5.11.1
> Environment: Solaris 5.11
> Reporter: John Lindwall
> Attachments: ActiveMQFailOverDurableMessageListener.java,
> ActiveMQFailOverMessageSender.java, master1-activemq.xml,
> master2-activemq.xml, slave1-activemq.xml, slave2-activemq.xml
>
>
> When a slave takes over for a failed master, pending messages are not
> delivered.
> I have a 5.11 cluster consisting of 2 pairs of master/slaves: m1/s1 and
> m2/s2. They use multicast://default for their networkConnectors. 1
> subscriber, 1 publisher, also both using multicast urls. My subscriber is a
> durable subscriber. Msgs are persistent.
> I am testing system robustness in the face of a master failure. I have 3
> test cases, of which 2 behave as expected and 1 is problematic. My publisher
> connects to a master, sends a set of 10 persistent messages and exits. The
> subscriber (durable) receives a message and spends 1 sec simulating
> processing time, and waits for the next msg (auto-acknowledge).
> For each test case I connect the subscriber, then publish the message set,
> then kill a master after a few messages are received by the subscriber. When
> the slave comes online I expect the remaining msgs to be delivered.
> 1. subscribe to m2, publish to m2, kill m2. Messages are all delivered
> 2. subscribe to m1, publish to m2, kill m2. Messages are all delivered
> 3. subscribe to m1, publish to m2, kill m1. Remaining msgs are NOT DELIVERED
> :(
> In case #3, when m1 is killed I can see the subscriber reconnecting to m2.
> The remaining messages are not delivered at that time though.
> If I then connect the subscriber directly to s1 (using tcp:// url), the
> remaining msgs are indeed delivered. I would have expected s1 to route the
> remaining msgs to m2 during the test execution, but that did not happen.
> When I "kill" the master I mean that I do "kill -9 XXXX".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)