[
https://issues.apache.org/jira/browse/AMQ-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490017#comment-15490017
]
Gary Tully edited comment on AMQ-6429 at 9/14/16 10:03 AM:
-----------------------------------------------------------
most likely there are some duplicate sends in the mix. Producers that use
failover and had an inflight send when they lost their connection and
reconnected.
The first approach may be to use maxReconnectAttempts=0 in the failover urls
such that the application sees these connection failures and can deal with them
with new messages that won't be seen as duplicates by the broker.
the other option is some bug in the sync between the cursor and the store.
disabling the cursor cache may avoid that scenario.
There are message audits on the cursors, if they detect a duplicate they will
redirect it to the DLQ in case there is some error in the duplicate suppression
to ensure no message loss. From that perspective the DLQ logging looks ok.
However with 8 duplicates, it may be that the cursor audit needs to be
configured with larger limits such that it will suppress more duplicates.
see: PolicyEntry - setMaxProducersToAudit (the number for max concurrent
producers - default to 64) and setMaxAuditDepth (the range to track - a
transaction batch size). Most likely setMaxProducersToAudit needs to be larger
for your setup.
To fully understand what is going on, we need a scenario that will reproduce
and really against the master code base, which will contain all of the latest
fixes.
was (Author: gtully):
most likely there are some duplicate sends in the mix. Producers that use
failover and had an inflight send when they lost their connection and
reconnected.
The first approach may be to use maxReconnectAttempts=0 in the failover urls
such that the application sees these connection failures and can deal with them
with new messages that won't be seen as duplicates by the broker.
the other option is some bug in the sync between the cursor and the store.
disabling the cursor cache may avoid that scenario.
There are message audits on the cursors, if they detect a duplicate they will
redirect it to the DLQ in case there is some error in the duplicate suppression
to ensure no message loss. From that persective the DLQ logging looks ok.
However with 8 duplicates, it may be that the cursor audit needs to be
configured with larger limits such that it will suppress more duplicates.
see: PolicyEntry - setMaxProducersToAudit (the number for max concurrent
producers - default to 64) and setMaxAuditDepth (the range to track - a
transaction batch size). Most likely setMaxProducersToAudit needs to be larger
for your setup.
To fully understand what is going on, we need a scenario that will reproduce
and really against the master code base, which will contain all of the latest
fixes.
> lost messages
> --------------
>
> Key: AMQ-6429
> URL: https://issues.apache.org/jira/browse/AMQ-6429
> Project: ActiveMQ
> Issue Type: Bug
> Affects Versions: 5.11.4
> Reporter: Asbjørn Aarrestad
>
> We have experienced a problem during somewhat high load (>500 000 messages
> over 30 minutes to multiple queues), where 2 messages was both delivered and
> DLQ’ed, 8 messages was delivered twice and 7 messages disappeared (but then
> upon inspection 6 of them is present in the AMQ database, somehow without AMQ
> noticing).
> We are running an ActiveMQ 5.11.4 broker in JDBC Master Slave setup with
> MSSQL server as persistent store, with 2 slaves (i.e. hot standbys).
> There was no master-switching (failover) during the incident.
> We have no indication that there were problems on the MS SQL server at the
> time.
> There are only two log lines in the ActiveMQ log at the time of the incident:
> 2016-08-29 06:03:54,857 [ActiveMQ BrokerService[svg-amq03_61616] Task-351933]
> WARN o.a.a.b.r.c.AbstractStoreCursor -
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch@24b740dc:stow:AgreementService.private.createOrder,batchResetNeeded=false,size=1,cacheEnabled=false,maxBatchSize:1,hasSpace:true,pendingCachedIds.size:0,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:null,lastAsyncCachedId-seq:null,store=stow:AgreementService.private.createOrder,pendingSize:1
> - cursor got duplicate from store
> ID:svg-agreement01-49217-1471461423933-1:15:1:6980:1 seq: 1233709
> 2016-08-29 06:03:54,857 [ActiveMQ BrokerService[svg-amq03_61616] Task-351933]
> WARN o.a.activemq.broker.region.Queue - duplicate message from store
> ID:svg-agreement01-49217-1471461423933-1:15:1:6980:1, redirecting for dlq
> processing
> 2016-08-29 06:03:54,920 [ActiveMQ BrokerService[svg-amq03_61616] Task-351926]
> WARN o.a.a.b.r.c.AbstractStoreCursor -
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch@24b740dc:stow:AgreementService.private.createOrder,batchResetNeeded=false,size=2,cacheEnabled=false,maxBatchSize:2,hasSpace:true,pendingCachedIds.size:0,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:null,lastAsyncCachedId-seq:null,store=stow:AgreementService.private.createOrder,pendingSize:1
> - cursor got duplicate from store
> ID:svg-agreement01-49217-1471461423933-1:15:1:6981:1 seq: 1233746
> 2016-08-29 06:03:54,920 [ActiveMQ BrokerService[svg-amq03_61616] Task-351926]
> WARN o.a.activemq.broker.region.Queue - duplicate message from store
> ID:svg-agreement01-49217-1471461423933-1:15:1:6981:1, redirecting for dlq
> processing
> The two DLQ’ed messages from the log lines were both delivered correctly, and
> then also DLQ’ed.
> At the same time we got 8 other messages delivered twice, and 7 messages
> looked like they were gone. When querying the AMQ database, 6 of the 7 lost
> messages are present in the database, but not present when querying MBeans
> for the queues – leaving 1 message without a trace. (We know it was sent, due
> to log lines from the application).
> All of this happened at the same time (during same second), and all of the
> problematic messages in question were on the same queue.
> Any idea why this happenend, and how to avoid it the future?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)