[
https://issues.apache.org/jira/browse/AMQ-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pete Bertrand updated AMQ-5424:
-------------------------------
Fix Version/s: NEEDS_REVIEW
> Broker at 100% CPU when idle after Network Connection reconnect with
> duplicates sent
> ------------------------------------------------------------------------------------
>
> Key: AMQ-5424
> URL: https://issues.apache.org/jira/browse/AMQ-5424
> Project: ActiveMQ
> Issue Type: Bug
> Affects Versions: 5.10.0
> Reporter: Pete Bertrand
> Fix For: NEEDS_REVIEW
>
> Attachments: activemq.xml, thread-dump.txt
>
>
> In a network of 2 brokers (A and B) with durable queued messages
> going from A to B over a duplex NetworkConnector,
> if A is stopped and restarted while messages are in-flight,
> and if replayed messages from A are recognized as duplicates on B,
> then 30 seconds after B goes idle, B's CPU goes to 100%.
> I have attached the thread dump to the ticket.
> From what I have been able to figure out, the dequeue counter does not count
> moving the duplicate into the DLQ. The counters show a pending message when
> there is none in the persisted queue. So when the scheduler kicks in 30
> seconds
> after the broker goes idle, it says "I have a pending message, fetch it from
> the DB"
> but the fetch returns 0 messages. Immediately the scheduler still sees pending
> messages and does a DB fetch, with no results. This is where the CPU is
> spinning.
> See the attached thread dump.
> So, in detail:
> It appears that after A is restarted and it replays messages that have not
> been ACKed,
> B receives duplicate messages and sends them to the DLQ. Here is the warning
> from the log:
> {noformat}
> WARN | duplicate message from store
> ID:host-lnx-59946-1415221396197-1:1:1:1:468, redirecting for dlq processing |
> org.apache.activemq.broker.region.Queue | ActiveMQ VMTransport:
> vm://broker1#11-1
> {noformat}
> After all messages are delivered and the brokers are idle for 30 seconds and
> the CPU on B is now 100%, if you use the WebConsole and look at the queues on
> B you see the following:
> {noformat}
> Number Of
> Queue Pending Number Of Messages Messages
> Name Messages Consumers Enqueued Dequeued
> ActiveMQ.DLQ 1 0 1 0
> TEST.FOO 1 1 469 468
> {noformat}
> On this test run, only one message was a duplicate. It was moved to the DLQ,
> but the TEST.FOO counters show it as pending. The counters are out of sync
> with actual messages in the persisted queue, because the duplicate message is
> now in the DLQ and not in the TEST.FOO queue.
> At this point if you purge TEST.FOO, CPU on B goes back to normal because
> this clears the pending message counter.
> +*Steps to reproduce*+
> Set up 2 brokers as follows:
> *producer* ==> *broker-A* <== duplex network connection ==> *broker-B*
> ==> *consumer*
> 1) Download the binary distribution of AMQ 5.10.0 and extract
> apache-activemq-5.10.0-bin.tar.gz
> 2) Create two brokers
> {noformat}
> $ ACTIVEMQ_HOME/bin/activemq create /path/to/brokers/broker-a
> $ ACTIVEMQ_HOME/bin/activemq create /path/to/brokers/broker-b
> {noformat}
> 3) Update broker-a to connect to broker-b with a duplex connection.
> _You can use the attached *activemq.xml*_. It does the following:
> - Sets transport for broker-a to port 61610
> - Sets up networkConnector to connect to broker-b on 61616
> - Does not start jetty web console on broker-a to avoid port conflict
> broker-b is un-modified and defaults to port 61616
> 4) Start the brokers
> {noformat}
> $ broker-a/bin/broker-a start
> $ broker-b/bin/broker-b start
> {noformat}
> 5) Start consumer connected to broker-b and producer connected to broker-a
> {noformat}
> $ ant consumer -Durl=tcp://localhost:61616 -Ddurable=true
> $ ant producer -Durl=tcp://localhost:61610 -Ddurable=true
> {noformat}
> 6) Stop broker-a before producer is finished sending messages, then restart
> {noformat}
> $ broker-a/bin/broker-a stop
> $ broker-a/bin/broker-a start
> {noformat}
> 7) Look at broker-b logs for duplicates, look at broker-b web console for
> pending messages
> http://localhost:8161/admin/queues.jsp
> 8) 30 seconds after going idle, broker-b CPU will goto 100%
> 9) Purge TEST.FOO on broker-b, pending messages will reset and CPU will go
> back to normal.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)