[
https://issues.apache.org/activemq/browse/AMQ-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=51032#action_51032
]
Aaron Riekenberg commented on AMQ-2149:
---------------------------------------
I checked out the current activemq 5.3-SNAPSHOT from trunk (r762095) and built
it. This should have all of Gary's fixes in it, correct?
I used the attached activemq.xml configuration file and ran my MasterSlaveTest
and run_master_slave_brokers.sh to cause failovers. This version of the test
does not use transactions, and syncOnWrite was set to false.
After 3 master/slave failovers one of the queues redelivered message 705 when
the client expected 3570.
>From my perspective, it appears the problem is not solved by the changes
>listed so far. Therefore I am reopening this issue.
Output from MasterSlaveTest:
{{INFO: Successfully reconnected to tcp://localhost:61616}}
{{Apr 5, 2009 7:31:55 AM
org.apache.activemq.transport.failover.FailoverTransport doReconnect}}
{{INFO: Successfully reconnected to tcp://localhost:61616}}
{{Apr 5, 2009 7:31:56 AM org.aaron.MasterSlaveTest$Receiver onMessage}}
{{WARNING: test.queue.2 received 705 expected 3570}}
Output from run_master_slave_brokers.sh:
{{Sun Apr 5 07:30:13 CDT 2009 started master broker pid 5691}}
{{Sun Apr 5 07:30:23 CDT 2009 started slave broker pid 5901}}
{{Sun Apr 5 07:30:33 CDT 2009 killing master broker pid 5691, new master pid
5901}}
{{Sun Apr 5 07:30:53 CDT 2009 started slave broker pid 6112}}
{{Sun Apr 5 07:31:13 CDT 2009 killing master broker pid 5901, new master pid
6112}}
{{Sun Apr 5 07:31:33 CDT 2009 started slave broker pid 6336}}
{{Sun Apr 5 07:31:53 CDT 2009 killing master broker pid 6112, new master pid
6336}}
Log file is attached as activemq.log.2009_04_05.
> Shared Filesystem Master Slave: missing messages
> ------------------------------------------------
>
> Key: AMQ-2149
> URL: https://issues.apache.org/activemq/browse/AMQ-2149
> Project: ActiveMQ
> Issue Type: Bug
> Affects Versions: 5.2.0
> Environment: Ubuntu Linux 8.10 AMD64, Sun JDK 1.6.0.10
> Reporter: Aaron Riekenberg
> Assignee: Gary Tully
> Fix For: 5.3.0
>
> Attachments: activemq.log, activemq.log.2009_03_12_1,
> activemq.log.2009_03_12_2, activemq.log.2009_04_05, activemq.xml,
> AMQ-2149.zip, amq2149.patch, MasterSlaveTest.java,
> MasterSlaveTestWithTransactions.java, run_master_slave_brokers.sh
>
>
> I'm finding occasionally messages are not delivered in order in a shared
> filesystem master slave setup when the master fails and the slave takes over.
> I'm running a simple test on one physical machine where the shared
> filesystem is on a single disk (no SAN currently involved).
> I'm attaching a shell script (run_master_slave_brokers.sh) that starts a
> master and slave broker in the same directory, sleeps 20 seconds, kills the
> master, sleeps 20 seconds, starts a new slave, sleeps 20 seconds, kills the
> master, etc.
> Also attached is a small java test program (MasterSlaveTest.java) The
> program starts 10 JMS senders that send 75kb text messages every 25 ms to
> unique queues. These messages contain a sequence number header (a long).
> The program also starts 10 receivers (1 for each queue) that keep track of
> the next expected sequence number and validate each incoming sequence number.
> If a receiver gets an unexpected sequence number, the test program exits
> (System.exit(1)). Both the senders and receivers use the failover transport
> to connect to the broker. Messages being sent are persistent, so in theory
> there should be no message loss when the master fails and slave takes over.
> I run the script to start the brokers, then run my test program. Most times
> when the script kills the master and the slave is promoted, things work fine
> - the test program reconnects, and messages continue to be delivered in
> order. If I run this long enough though, eventually my test program fails
> just after a slave broker is promoted to master with output similar to this:
> Mar 6, 2009 11:58:12 AM
> org.apache.activemq.transport.failover.FailoverTransport doReconnect
> INFO: Successfully reconnected to tcp://localhost:61616
> Mar 6, 2009 11:58:12 AM org.aaron.MasterSlaveTest$Receiver onMessage
> WARNING: test.queue.3 received 630 expected 629
> This indicates the receiver for test.queue.3 received message 630 after the
> slave broker took over and missed message 629.
> This seems to happen more often when more senders and receivers are running
> and more queues are in use. If I run a single sender/receiver pair on 1
> queue, it is very difficult to make this happen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.