Re: 5.0 (and later) Queue stops dispatching

Rob Davies Tue, 11 Mar 2008 12:44:30 -0700


On 11 Mar 2008, at 17:08, Pete Schwamb wrote:

I've been seeing some freezes on Queue dispatching, particularlywith the latest 5.0 release and more recent snapshots. I've triedvery hard to reproduce reliably it in a small test case, but itseems very timing dependent. I was able to reproduce at least onevariant of it fairly reliably. I am using the defaultAMQMessageStore setup. Also, I'm using fuse-5.0.0.9, because theSNAPSHOT builds are failing even more spectacularly for me at themoment, though from what I've seen in SVN, I believe this is stillan issue in the trunk.
There are two non-durable subscribers on the queue via stomp, andthey consume more slowly than the producer, which publishes inbursts. After the first burst of 30 - 50k messages, I stop theproducer and let the consumers catch up. Then I publish anotherburst of messages. This is usually where the freeze happens.
First, I usually get a message like the following:
ERROR RecoveryListenerAdapter - Message idID:sand-52497-1205185863002-2:1:2:1:42787 could not be recoveredfrom the data store - already dispatched
Then the queue stops dispatching.
Here's what I first saw in the debugger, after the "alreadydispatched" message appears:
a) on the Queue, messages.hasNext() returns false, so the doPageIn()method never pages anything in.b) messages.hasNext() -> currentCursor.hasNext() -> fillBatch() ->doFillBatch() -> this.store.recoverNextMessages(this.maxBatchSize,this) ->
this.store.recoverNextMessages(this.maxBatchSize, this)
c) KahaReferenceStore recoverNextMessages gets null back frommessageContainer.getNext(entry), because entry.nextItem = -1
However, the message store usually has many thousands of messagesstill in it, as evidenced by the 'size' attribute onDiskIndexLinkedList. So this is the first hint that the LinkedListis corrupt. I started looking more closely at DiskIndexLinkedList,and noticed the following incorrect (I think) behavior:
In DiskIndexLinkedList.getNextEntry(IndexItem current), line 274 is"result = last". On some occasions result.nextItem is -1, andlast.nextItem != -1. Shouldn't last.nextItem always be -1? I'mwondering if the opposite was intended: to update "last".
So I changed the following:
Index: src/main/java/org/apache/activemq/kaha/impl/index/DiskIndexLinkedList.java
===================================================================
--- src/main/java/org/apache/activemq/kaha/impl/index/DiskIndexLinkedList.java (revision 635580)+++ src/main/java/org/apache/activemq/kaha/impl/index/DiskIndexLinkedList.java (working copy)
@@ -271,7 +271,7 @@
             }
             // essential last get's updated consistently
if (result != null && last != null &&last.equals(result)) {
-                       result = last;
+                       last = result;
             }
             return result;
     }
And indeed, I no longer get the "already dispatched" message, andqueues continue dispatching after many cycles of the producerflowing 10s of thousands of messages through.
Hopefully this sheds some light on stability issues others may behaving. I'm not sure I've fixed the problem 100%. Is anyone elseseeing this?
-Pete

Beautiful!!! Thx Pete - love it when other folks fix my bugs ;)


cheers,

Rob

http://open.iona.com/ -Enterprise Open Integration
http://rajdavies.blogspot.com/

Re: 5.0 (and later) Queue stops dispatching

Reply via email to