[ 
https://issues.apache.org/activemq/browse/AMQ-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Cooper reopened AMQ-2475:
---------------------------------


I am still able to consistently see this deadlock behavior on our system with 
both version 5.3.0 and 5.3.2.  The patch loops in the TopicSubscription add 
method while holding the matchedListMutex until thePendingMessageCursor 
"matched" is not full.  The code then attempts to call addMessageLast on the 
"matched" instance assuming that the matchedListMutex will prevent any 
additional threads from taking that space.  This assumption is wrong, because I 
have found that when using a filePendingMessageCursor, the addMessageLast 
method will end up calling systemUsage.getTempUsage().waitForSpace() which for 
whatever reason can be full when it is called and without the ability to reduce 
in size due to monitors already held earlier in the stack.  Therefore, the code 
loops infinitely and the system is deadlocked.

To workaround this issue, I switched to using vm cursors, which don't rely on 
this shared pool of temp file storage, and haven't seen the deadlock.

I am new to this project and still trying to understand the code completely, 
but this is what I have found.  I think the looping that is happening to wait 
for space is happening to early in the stack.  The matchedListMutex does not 
seem to lock out other threads that use temp storage.  I'm not sure what the 
correct fix is, but without a significant reworking of the code, the best I can 
think to do would be to have the addMessageLast method throw some kind of 
exception or have a return value if space is not available, so the calling 
method can again release its the matchedListMutext by calling wait, and try 
again.  And addMessageLast wold also not call waitForSpace with an infinite 
timeout, but instead specify a small timeout.

> If tmp message store fills up, broker can deadlock due to while producers 
> wait on disk space and consumers wait on acks
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-2475
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2475
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Message Store, Transport
>    Affects Versions: 5.3.0
>         Environment: Tested on Windows XP with JDK 1.60_13, but fairly sure 
> it will be an issue on all platforms
>            Reporter: Martin Murphy
>            Assignee: Rob Davies
>             Fix For: 5.3.1, 5.4.0
>
>         Attachments: activemq.xml, hangtest.zip, Queue.java, 
> Queue.patchfile.txt, Topic.java, Topic.patchfile.txt, TopicSubscription.java, 
> TopicSubscription.patchfile.txt
>
>
> I will attach a simple project that shows this. In the test the tmp space is 
> set to 32 MB and two threads are created. One thread will constantly produce 
> 1KB messages and the other consumes these, but sleeps for 100ms, note that 
> producer flow control is turned off as well. The goal here is to ensure that 
> the producers block while the consumers read the rest of the messages from 
> the broker and catch up, this in turn frees up the disk space and allows the 
> producer to send more messages. This config means that you can bound the 
> broker based on disk space rather than memory usage.
> Unfortunately in this test using topics while the broker is reading in the 
> message from the producer it has to lock the matched list it is adding it to. 
> This is an abstract from the Topic's point of view and doesn't realize that 
> the file may block based on the file system. 
> {code}
>     public void add(MessageReference node) throws Exception { //... snip ...
>             if (maximumPendingMessages != 0) {
>                 synchronized (matchedListMutex) {   // We have this mutex
>                     matched.addMessageLast(node); // ends up waiting for space
>                     // NOTE - be careful about the slaveBroker!
>                     if (maximumPendingMessages > 0) {
> {code}
> Meanwhile the consumer is sending acknowledgements for the 10 messages it 
> just read in (the configured prefetch) from the same topic, but since they 
> also modify the same list in the topic this waits as well on the mutex held 
> to service the producer:
> {code}
>     private void dispatchMatched() throws IOException {       
>         synchronized (matchedListMutex) {  // never gets passed here.
>             if (!matched.isEmpty() && !isFull()) {
> {code}
> This is a fairly classic deadlock. The trick is now how to resolve this given 
> the fact that the topic isn't aware that it's list may need to wait for the 
> file system to clean up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to