5.0 (and later) Queue stops dispatching

Pete Schwamb Tue, 11 Mar 2008 10:09:30 -0700

I've been seeing some freezes on Queue dispatching, particularly withthe latest 5.0 release and more recent snapshots. I've tried very hardto reproduce reliably it in a small test case, but it seems very timingdependent. I was able to reproduce at least one variant of it fairlyreliably. I am using the default AMQMessageStore setup. Also, I'musing fuse-5.0.0.9, because the SNAPSHOT builds are failing even morespectacularly for me at the moment, though from what I've seen in SVN, Ibelieve this is still an issue in the trunk.

There are two non-durable subscribers on the queue via stomp, and theyconsume more slowly than the producer, which publishes in bursts. Afterthe first burst of 30 - 50k messages, I stop the producer and let theconsumers catch up. Then I publish another burst of messages. This isusually where the freeze happens.


First, I usually get a message like the following:

ERROR RecoveryListenerAdapter - Message idID:sand-52497-1205185863002-2:1:2:1:42787 could not be recovered fromthe data store - already dispatched


Then the queue stops dispatching.

Here's what I first saw in the debugger, after the "already dispatched"message appears:

a) on the Queue, messages.hasNext() returns false, so the doPageIn()method never pages anything in.b) messages.hasNext() -> currentCursor.hasNext() -> fillBatch() ->doFillBatch() -> this.store.recoverNextMessages(this.maxBatchSize, this) ->

this.store.recoverNextMessages(this.maxBatchSize, this)

c) KahaReferenceStore recoverNextMessages gets null back frommessageContainer.getNext(entry), because entry.nextItem = -1

However, the message store usually has many thousands of messages stillin it, as evidenced by the 'size' attribute on DiskIndexLinkedList. Sothis is the first hint that the LinkedList is corrupt. I startedlooking more closely at DiskIndexLinkedList, and noticed the followingincorrect (I think) behavior:

In DiskIndexLinkedList.getNextEntry(IndexItem current), line 274 is"result = last". On some occasions result.nextItem is -1, andlast.nextItem != -1. Shouldn't last.nextItem always be -1? I'mwondering if the opposite was intended: to update "last".


So I changed the following:

Index:src/main/java/org/apache/activemq/kaha/impl/index/DiskIndexLinkedList.java

===================================================================

---src/main/java/org/apache/activemq/kaha/impl/index/DiskIndexLinkedList.java(revision 635580)+++src/main/java/org/apache/activemq/kaha/impl/index/DiskIndexLinkedList.java(working copy)

@@ -271,7 +271,7 @@
              }
              // essential last get's updated consistently
              if (result != null && last != null && last.equals(result)) {
-                       result = last;
+                       last = result;
              }
              return result;
      }

And indeed, I no longer get the "already dispatched" message, and queuescontinue dispatching after many cycles of the producer flowing 10s ofthousands of messages through.

Hopefully this sheds some light on stability issues others may behaving. I'm not sure I've fixed the problem 100%. Is anyone elseseeing this?


-Pete

5.0 (and later) Queue stops dispatching

Reply via email to