Messages (possibly) stuck and pending messages count showing high number of 
pending message which do not get sent to a consumer.
--------------------------------------------------------------------------------------------------------------------------------

                 Key: AMQ-3473
                 URL: https://issues.apache.org/jira/browse/AMQ-3473
             Project: ActiveMQ
          Issue Type: Bug
          Components: Message Store
    Affects Versions: 5.5.0
         Environment: Ubuntu 11.04 (64-bit)

Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

            Reporter: Mat Sharpe


Two brokers, each with a network connection to the other. We have two producers 
producing persistent messages to a single queue at a rate of 20-50/second. 
There is a single consumer. All clients prefer the primary broker.

The consumer is 'bursty' - i.e. it grabs 5000 messages and then processes them. 
During processing, new messages build up on the broker.

If the primary broker is restarted we will see it come back with, as you would 
expect, with a number of pending messages. This message count never fully 
returns to 0 even if the producers are stopped and browsing the queue through 
the GUI shows either no messages or only messages that were produced since the 
restart.


I have turned on Kaha debugging and, after the initial restart, we see the 
following during every checkpoint:
 [eckpoint Worker] TRACE MessageDatabase                - Last update: 
3974:2450180, full gc candidates set: [3950, 3951, 3973, 3974]
...
 [eckpoint Worker] TRACE MessageDatabase                - gc candidates after 
dest:1:MyQueue, [3951, 3973]
...
 [eckpoint Worker] TRACE MessageDatabase                - gc candidates: [3951, 
3973]
 [eckpoint Worker] TRACE MessageDatabase                - not removing data 
file: 3951 as contained ack(s) refer to referenced file: [3950, 3951]
 [eckpoint Worker] DEBUG MessageDatabase                - Cleanup removing the 
data files: [3973]
(I assume that is supposed to say '[Checkpoint Worker]', incidentally)

After the second restart we will see many:
 [0.8.0.200:47300] WARN  MessageDatabase                - Duplicate message add 
attempt rejected. Destination: MyQueue, Message id: 
ID:node001-58675-1314038640553-0:17:1:1:470776

Followed by:
 [eckpoint Worker] TRACE MessageDatabase                - Last update: 
3974:13515407, full gc candidates set: [3950, 3951, 3974]
...
 [eckpoint Worker] TRACE MessageDatabase                - gc candidates after 
dest:1:MyQueue, [3951]
...
 [eckpoint Worker] TRACE MessageDatabase                - gc candidates: [3951]
 [eckpoint Worker] DEBUG MessageDatabase                - Cleanup removing the 
data files: [3951]



This is very similar, if not the same, to AMQ-2955. I have tried setting 
'useCache=false' but this does not rectify the issue. This could also be a 
similar issue to AMQ-3281.

I will attach a config. Please advise if you would like me to enable further 
debugging.

I don't currently have a test harness that replicates this issue and due to the 
fact this is only happening in our production environment, I'm unable to verify 
reliably whether messages are being lost, delayed or if this is purely a stats 
issue.




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to