[
https://issues.apache.org/activemq/browse/AMQ-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=50304#action_50304
]
Aaron Riekenberg commented on AMQ-2149:
---------------------------------------
I tried several more tests using the script and test program above, with
similarly bad results. So far I am unable to find a completly reliable way to
implement master/slave failover with a shared filesystem.
I have tried all tests so far on both Apache ActiveMQ 5.2.0 and the new FUSE
message broker 5.3.0.0 with identical results.
1. In my original test, the syncOnWrite parameter for the amqPersistenceAdapter
was set to the default value "false". I thought this might be part of my
problem, so I changed it to syncOnWrite="true". I am certain changing
syncOnWrite had an effect, because it reduced the rate of messages being sent
and received to 20 per second. The test program still used AUTO_ACKNOWLEDGE in
the sender and receiver. This failed after 3 master/slave failovers:
Mar 7, 2009 7:35:42 AM org.aaron.MasterSlaveTest$Receiver onMessage
WARNING: test.queue.9 received 1904 expected 1903
2. Next I set syncOnWrite="false" and enabled transactions in both Sender and
Receiver. To do this I changed the call to createSession to have parameters
"true" and "Session.SESSION_TRANSACTED". I called session.commit after each
send and receive. See MasterSlaveTestWithTransactions.java.
This failed after 4 master/slave failovers:
Mar 7, 2009 7:12:55 AM org.aaron.MasterSlaveTest$Receiver onMessage
WARNING: test.queue.8 received 1530 expected 3703
3. Finally I set syncOnWrite="true" and ran again with transactions enabled in
both Sender and Receiver (MasterSlaveTestWithTransactions.java).
This failed after 6 master/slave failovers:
Mar 7, 2009 7:32:19 AM org.aaron.MasterSlaveTest$Receiver onMessage
WARNING: test.queue.3 received 1734 expected 1725
> Shared Filesystem Master Slave: missing messages
> ------------------------------------------------
>
> Key: AMQ-2149
> URL: https://issues.apache.org/activemq/browse/AMQ-2149
> Project: ActiveMQ
> Issue Type: Bug
> Affects Versions: 5.2.0
> Environment: Ubuntu Linux 8.10 AMD64, Sun JDK 1.6.0.10
> Reporter: Aaron Riekenberg
> Attachments: activemq.xml, MasterSlaveTest.java,
> run_master_slave_brokers.sh
>
>
> I'm finding occasionally messages are not delivered in order in a shared
> filesystem master slave setup when the master fails and the slave takes over.
> I'm running a simple test on one physical machine where the shared
> filesystem is on a single disk (no SAN currently involved).
> I'm attaching a shell script (run_master_slave_brokers.sh) that starts a
> master and slave broker in the same directory, sleeps 20 seconds, kills the
> master, sleeps 20 seconds, starts a new slave, sleeps 20 seconds, kills the
> master, etc.
> Also attached is a small java test program (MasterSlaveTest.java) The
> program starts 10 JMS senders that send 75kb text messages every 25 ms to
> unique queues. These messages contain a sequence number header (a long).
> The program also starts 10 receivers (1 for each queue) that keep track of
> the next expected sequence number and validate each incoming sequence number.
> If a receiver gets an unexpected sequence number, the test program exits
> (System.exit(1)). Both the senders and receivers use the failover transport
> to connect to the broker. Messages being sent are persistent, so in theory
> there should be no message loss when the master fails and slave takes over.
> I run the script to start the brokers, then run my test program. Most times
> when the script kills the master and the slave is promoted, things work fine
> - the test program reconnects, and messages continue to be delivered in
> order. If I run this long enough though, eventually my test program fails
> just after a slave broker is promoted to master with output similar to this:
> Mar 6, 2009 11:58:12 AM
> org.apache.activemq.transport.failover.FailoverTransport doReconnect
> INFO: Successfully reconnected to tcp://localhost:61616
> Mar 6, 2009 11:58:12 AM org.aaron.MasterSlaveTest$Receiver onMessage
> WARNING: test.queue.3 received 630 expected 629
> This indicates the receiver for test.queue.3 received message 630 after the
> slave broker took over and missed message 629.
> This seems to happen more often when more senders and receivers are running
> and more queues are in use. If I run a single sender/receiver pair on 1
> queue, it is very difficult to make this happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.