[
https://issues.apache.org/activemq/browse/AMQ-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40104
]
David Sitsky commented on AMQ-1251:
-----------------------------------
Hiram - thanks for your comment. Your unit test confirms that what we are
dealing with here is a race condition in the activemq code, somewhere.
Although the original unit tests do have excessive synchronization, it is not
invalid code from what I can tell, and it should complete to the end.
I started to have a little look at the activemq code under a debugger, and
noticed the case when both workers are stuck, is when they have non-empty
pending queues where all message references have dropped is true. When the
dispatcher sends the next message from the master, it is just added to both
pending queues for the worker, and according to the logic I saw, it wasn't
immediately dispatched since the pending queue was not empty.
I tried pressing the gc() operation on various objects from JMX, but it didn't
seem to clear out any of the messages.
At this stage - nothing seems to happen, the pending queues stay non-empty, and
the new message is never delivered.
> Broker stops delivering messages to some consumers
> --------------------------------------------------
>
> Key: AMQ-1251
> URL: https://issues.apache.org/activemq/browse/AMQ-1251
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker
> Affects Versions: 4.1.0
> Environment: WinXP
> Reporter: Vadim Pesochinskiy
> Assignee: Rob Davies
> Fix For: 5.0.0
>
> Attachments: QueueWorkerPrefetchTest.java, TestActiveMQ.java,
> TestActiveMQSyncReceive.java
>
>
> I have around 40 consumers taking messages from a single queue. After awhile
> 1 or 2 consumers stop receiveing any messages. Going to JMX and stopping
> corresponding connection causes re-connect and messages are delivered again.
> I reproduced it twice in QA enviroment and now it happened in production. I
> tried to instrument the code and set the log in debug, but that changed
> timing and I failed to reproduce it after the changes.
> I suspect that runtime association b/w Queue and Consumer objects is lost on
> the Broker side.
> One of the suspects is the empty catch block in the RoundRobinDispatchPolicy
> (line 64) class. It is possible that the CopyOnWrite array list is messed up
> and it fails when removed consumer is added back.
> BTW CopyOnWrite list is good when you mostly read, but not so good when you
> write for every message delivery and empty catch blocks are bad in any case.
> if (firstMatchingConsumer != null) {
> // Rotate the consumer list.
> try {
> consumers.remove(firstMatchingConsumer);
> consumers.add(firstMatchingConsumer);
> } catch (Throwable bestEffort) {
> }
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.