On 11 Mar 2008, at 17:08, Pete Schwamb wrote:
I've been seeing some freezes on Queue dispatching, particularly
with the latest 5.0 release and more recent snapshots. I've tried
very hard to reproduce reliably it in a small test case, but it
seems very timing dependent. I was able to reproduce at least one
variant of it fairly reliably. I am using the default
AMQMessageStore setup. Also, I'm using fuse-5.0.0.9, because the
SNAPSHOT builds are failing even more spectacularly for me at the
moment, though from what I've seen in SVN, I believe this is still
an issue in the trunk.
There are two non-durable subscribers on the queue via stomp, and
they consume more slowly than the producer, which publishes in
bursts. After the first burst of 30 - 50k messages, I stop the
producer and let the consumers catch up. Then I publish another
burst of messages. This is usually where the freeze happens.
First, I usually get a message like the following:
ERROR RecoveryListenerAdapter - Message id
ID:sand-52497-1205185863002-2:1:2:1:42787 could not be recovered
from the data store - already dispatched
Then the queue stops dispatching.
Here's what I first saw in the debugger, after the "already
dispatched" message appears:
a) on the Queue, messages.hasNext() returns false, so the doPageIn()
method never pages anything in.
b) messages.hasNext() -> currentCursor.hasNext() -> fillBatch() ->
doFillBatch() -> this.store.recoverNextMessages(this.maxBatchSize,
this) ->
this.store.recoverNextMessages(this.maxBatchSize, this)
c) KahaReferenceStore recoverNextMessages gets null back from
messageContainer.getNext(entry), because entry.nextItem = -1
However, the message store usually has many thousands of messages
still in it, as evidenced by the 'size' attribute on
DiskIndexLinkedList. So this is the first hint that the LinkedList
is corrupt. I started looking more closely at DiskIndexLinkedList,
and noticed the following incorrect (I think) behavior:
In DiskIndexLinkedList.getNextEntry(IndexItem current), line 274 is
"result = last". On some occasions result.nextItem is -1, and
last.nextItem != -1. Shouldn't last.nextItem always be -1? I'm
wondering if the opposite was intended: to update "last".
So I changed the following:
Index: src/main/java/org/apache/activemq/kaha/impl/index/
DiskIndexLinkedList.java
===================================================================
--- src/main/java/org/apache/activemq/kaha/impl/index/
DiskIndexLinkedList.java (revision 635580)
+++ src/main/java/org/apache/activemq/kaha/impl/index/
DiskIndexLinkedList.java (working copy)
@@ -271,7 +271,7 @@
}
// essential last get's updated consistently
if (result != null && last != null &&
last.equals(result)) {
- result = last;
+ last = result;
}
return result;
}
And indeed, I no longer get the "already dispatched" message, and
queues continue dispatching after many cycles of the producer
flowing 10s of thousands of messages through.
Hopefully this sheds some light on stability issues others may be
having. I'm not sure I've fixed the problem 100%. Is anyone else
seeing this?
-Pete
Beautiful!!! Thx Pete - love it when other folks fix my bugs ;)
cheers,
Rob
http://open.iona.com/ -Enterprise Open Integration
http://rajdavies.blogspot.com/