tai-jen gordon created AMQ-6333:
-----------------------------------

             Summary: queue messages lost during failover
                 Key: AMQ-6333
                 URL: https://issues.apache.org/jira/browse/AMQ-6333
             Project: ActiveMQ
          Issue Type: Bug
          Components: activemq-leveldb-store
    Affects Versions: 5.13.2
         Environment: a 3 node cluster running on RHEL 6.7
            Reporter: tai-jen gordon


In this 3 node cluster running ActiveMQ 5.13.2 and levelDB, we tested failover 
with a producer that generates 200 AccountIds ( from 0 to 199 ) , and expected 
that the consumer would receive the same number of messages after a triggered 
failover ( kill -9 master-process-id). 

The producer used syncSend, the consumer used transactional session, and the 
destination is a queue.

We printed out the message sequence and its corresponding accountID on both the 
producer and the consumer consoles after a message was successful send and 
receive respectively.

The result showed that the producer sent 199 messages ( one failed due to fail 
over), but the consumer received much less messages than 199.

For example, on the consumer console, before the failover we had
          ......
          received  = 13   accountid=12
          received  = 14    accountid=13
then, right after the failover the consumer console had 
         received  = 15    accountid=44    
         received  = 16     accountid=45
         ...
Notice that accounted from 14 to 43 are ever lost. 

We scan the broker logs of both the new and old master and noticed that the 
pagedInPendingDispatch.size in the "old" master had value = 30 which matches 
the number of lost messages. We repeated this test few  times and each time 
this size matched the number of lost messages. 

The configuration we used is listed below:
Broker configuration: 
 <broker xmlns="http://activemq.apache.org/schema/core"; brokerName="broker" 
dataDirectory="${activemq.data}" useJmx="true" useShutdownHook="false" 
shutdownOnMasterFailure="true" systemExitOnShutdown="true" 
systemExitOnShutdownExitCode="200">

        <destinationPolicy>
            <policyMap>
              <policyEntries>
                <policyEntry topic=">" > 
                    
                  <pendingMessageLimitStrategy>
                    <constantPendingMessageLimitStrategy limit="1000"/>
                  </pendingMessageLimitStrategy>
                </policyEntry>
<policyEntry queue=">" useCache="false" expireMessagesPeriod="0" /> 
              </policyEntries>
            </policyMap>
        </destinationPolicy> <persistenceAdapter>
            <replicatedLevelDB 
              directory="${activemq.base}/leveldb-data" 
              replicas="3" 
              bind="tcp://0.0.0.0:0" 

              zkAddress="test1:2181,test2:2181,test3:2181" 
              zkPath="/activemq/leveldb" 
              sync="quorum_disk" 
              paranoidChecks="true" 
              logCompression="snappy" 
             />
        </persistenceAdapter>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to