[ 
https://issues.apache.org/jira/browse/AMQ-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Bertrand updated AMQ-5424:
-------------------------------
    Attachment: activemq.xml
                thread-dump.txt

> Broker at 100% CPU when idle after Network Connection reconnect with 
> duplicates sent
> ------------------------------------------------------------------------------------
>
>                 Key: AMQ-5424
>                 URL: https://issues.apache.org/jira/browse/AMQ-5424
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.10.0
>            Reporter: Pete Bertrand
>         Attachments: activemq.xml, thread-dump.txt
>
>
> In a network of 2 brokers (A and B) with durable queued messages 
> going from A to B over a duplex NetworkConnector,
> if A is stopped and restarted while messages are in-flight, 
> and if replayed messages from A are recognized as duplicates on B,
> then 30 seconds after B goes idle, B's CPU goes to 100%.
> I have attached the thread dump to the ticket.
> From what I have been able to figure out, the dequeue counter does not count
> moving the duplicate into the DLQ. The counters show a pending message when
> there is none in the persisted queue. So when the scheduler kicks in 30 
> seconds
> after the broker goes idle, it says "I have a pending message, fetch it from 
> the DB"
> but the fetch returns 0 messages. Immediately the scheduler still sees pending
> messages and does a DB fetch, with no results. This is where the CPU is 
> spinning.
> See the attached thread dump.
> So, in detail:
> It appears that after A is restarted and it replays messages that have not 
> been ACKed,
> B receives duplicate messages and sends them to the DLQ. Here is the warning 
> from the log:
> {noformat}
>   WARN | duplicate message from store 
> ID:host-lnx-59946-1415221396197-1:1:1:1:468, redirecting for dlq processing | 
> org.apache.activemq.broker.region.Queue | ActiveMQ VMTransport: 
> vm://broker1#11-1
> {noformat}
> After all messages are delivered and the brokers are idle for 30 seconds and 
> the CPU on B is now 100%, if you use the WebConsole and look at the queues on 
> B you see the following:
> {noformat}
>               Number Of                       
>    Queue      Pending     Number Of  Messages  Messages
>    Name       Messages    Consumers  Enqueued  Dequeued
> ActiveMQ.DLQ     1            0         1         0
> TEST.FOO         1            1        469       468
> {noformat}
> On this test run, only one message was a duplicate. It was moved to the DLQ, 
> but the TEST.FOO counters show it as pending. The counters are out of sync 
> with actual messages in the persisted queue, because the duplicate message is 
> now in the DLQ and not in the TEST.FOO queue.
> At this point if you purge TEST.FOO, CPU on B goes back to normal because 
> this clears the pending message counter.
> +*Steps to reproduce*+
> Set up 2 brokers as follows:
>   *producer* ==> *broker-A*  <==  duplex network connection  ==>  *broker-B* 
> ==>  *consumer*
> 1) Download the binary distribution of AMQ 5.10.0 and extract 
> apache-activemq-5.10.0-bin.tar.gz
> 2) Create two brokers
> {noformat}
>  $ ACTIVEMQ_HOME/bin/activemq create /path/to/brokers/broker-a
>  $ ACTIVEMQ_HOME/bin/activemq create /path/to/brokers/broker-b
> {noformat}
> 3) Update broker-a to connect to broker-b with a duplex connection.
>    _You can use the attached *activemq.xml*_. It does the following:
> - Sets transport for broker-a to port 61610
> - Sets up networkConnector to connect to broker-b on 61616
> - Does not start jetty web console on broker-a to avoid port conflict
> broker-b is un-modified and defaults to port 61616
> 4) Start the brokers
> {noformat}
>  $ broker-a/bin/broker-a start
>  $ broker-b/bin/broker-b start
> {noformat}
> 5) Start consumer connected to broker-b and producer connected to broker-a
> {noformat}
>  $ ant consumer -Durl=tcp://localhost:61616 -Ddurable=true
>  $ ant producer -Durl=tcp://localhost:61610 -Ddurable=true
> {noformat}
> 6) Stop broker-a before producer is finished sending messages, then restart
> {noformat}
>  $ broker-a/bin/broker-a stop
>  $ broker-a/bin/broker-a start
> {noformat}
> 7) Look at broker-b logs for duplicates, look at broker-b web console for 
> pending messages
>  http://localhost:8161/admin/queues.jsp
> 8) 30 seconds after going idle, broker-b CPU will goto 100%
> 9) Purge TEST.FOO on broker-b, pending messages will reset and CPU will go 
> back to normal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to