[jira] Created: (AMQ-2317) Duplicate messages with transacted persistent messages during JDBC Master/Slave failover

Daniel Mueller (JIRA) Sun, 05 Jul 2009 06:52:30 -0700

Duplicate messages with transacted persistent messages during JDBC Master/Slave 
failover
----------------------------------------------------------------------------------------


                 Key: AMQ-2317
                 URL: https://issues.apache.org/activemq/browse/AMQ-2317
             Project: ActiveMQ
          Issue Type: Bug
    Affects Versions: 5.3.0
         Environment: OS: MacOS X  10.5.7 MacBook Core 2 Duo 2 Ghz
DBMS: MySQL 5.0.83 (through macports), SQLServer 2005 (in VMWare), other 
suspected but not thouroughly tested (including HSQL)
All observations are against trunk: rev 790957 (2009-07-03 23:07:04 +0700 (Fri, 
03 Jul 2009)) (fuse progress 5.3.0.3 and ActiveMQ 5.2.0 seem to have the same 
problem though)
            Reporter: Daniel Mueller
            Priority: Critical


There is a race condition somewhere in the transaction/replay code involving 
failovers of JDBC only Master/Slave configurations.

Observed problems:

If messages are sent to a master broker in one transaction, and during the time 
of the transaction the master fails over to the slave, then the messages seem 
to be replayed twice (both database holds duplicates (see query at the end) and 
the broker answer with message count containing duplicates).

Severity: 
If the clients are connected to the new master and start consuming, the broker 
will not deliver dups. The dups will be delivered though, if there is another 
failover (a common case for system upgrades). It seems like a single consumer 
will not get duplicates, even if it fails over again to new broker, but if the 
consumer is restarted, it loses it's state as well, and subsequently gets the 
duplicates delivered.

Attached is a testcase that demonstrates the problem. It shows that with a 
single producer doing commits after each send, it creates on additional message 
in the broker with a duplicate MSGID_SEQ. If everything is committed in one 
transaction, then every single message in the transaction is duplicated (and 
not only the ones before the failover occurred).

The testcase uses an external MySQL instance though, and needs the DBCP and the 
MySQL JDBC connector on the classpath (the pom is patched in the attached file 
to resolve that automatically).
Out of the 6 tests, the following almost always fail on my machine:

testProducer_MasterFailoverByShutdown_AtRandomTimes_CommitPerMessage  (expected 
<6000>, but was <6001>)
testProducer_MasterFailoverByShutdown_AtRandomTimes_OneCommit  (expected 
<6000>, but was <12000>)

Rarely (3-5% of the cases) this one also fails:
testProducer_MasterFailoverByShutdown_SingleMsgCommit_AfterCommit  (expected 
<500>, but was <501>)


Other observations made:
1) The problem seems to be a race condition because while trying to find the 
cause through debugging, the problem disapeared when setting a break point in 
TransactionInfo.visit(line:100). The race condition is met on my machine (specs 
above) basically all the time without interaction (from maven, on the shell 
with a build, inside eclipse debugged and normal).
2) It seems that TransactionBroker.commitTransaction(line:100) is called once 
with duplicated synchronizations (2x size). On the other hand 
MemoryTransactionStore$Tx(line:109) is called twice with the correct amount 
first, and later a doubled amount.
3) The problem is not reproducible with Kaha, the problem is related to JDBC.
4) It might be possible to have the testcase fail reliably with one of 
Derby/HSQL/H2, but I didn't investigate.
5) The testcase is not exactly very pretty, but it does show the problem ;)
6) The attached testcase is a patch against activemq-core.
7) The tests can be executed directly (in bash) with:
env MAVEN_OPTS="$MAVEN_OPTS -Xmx800M" mvn 
-Dtest=org.apache.activemq.transport.failover.FailoverTransactionalTest test
8) For MySQL the following should work: 
SELECT 
      MSGID_PROD
     ,MSGID_SEQ
  FROM activemq_msgs
GROUP BY MSGID_PROD,MSGID_SEQ
HAVING ( COUNT(MSGID_SEQ) > 1 );

9) if you need the my.cnf for the database, I can attach that as well.
10) The tables are correctly created as InnoDB

I think that's it...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (AMQ-2317) Duplicate messages with transacted persistent messages during JDBC Master/Slave failover

Reply via email to