Re: IndexOutOfBoundsException in AMQMessageStore using 20071028.173739

Rob Davies Fri, 02 Nov 2007 14:11:03 -0800

I wonder if you could take the latest again?


cheers,

Rob

http://rajdavies.blogspot.com/



On Nov 2, 2007, at 7:26 AM, Andrew Miller wrote:

Rob,
Yeah, in general, I've been deleting the entire local storedirectory inbetween tests, since we're just testing it is not critical that themessagesare persisted. (with the message queue down of course)Interestingly, Irestarted the broker to try to fix this problem. It marked thestore as badand regenerated the index from the journal (about an hour+ with~1.5-2m
messages).
The consumers started working again at that point, for a while.After itreached ~7200 dispatched messages, it really blew up. I got thisin the
wrapper.log:
INFO | jvm 1 | 2007/11/01 20:30:09 |java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.MessageIdMarshaller.writePayload(MessageIdMarshaller.java:38)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.MessageIdMarshaller.writePayload(MessageIdMarshaller.java:29)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashEntry.write(HashEntry.java:93)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashPage.write(HashPage.java:123)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashIndex.writeFullPage(HashIndex.java:350)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashPageInfo.end(HashPageInfo.java:98)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashBin.end(HashBin.java:299)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashBin.put(HashBin.java:153)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashIndex.store(HashIndex.java:241)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.container.MapContainerImpl.put(MapContainerImpl.java:303)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.kahadaptor.KahaReferenceStore.addMessageReference(KahaReferenceStore.java:113)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore$4.execute(AMQMessageStore.java:341)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.util.TransactionTemplate.run(TransactionTemplate.java:44)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore.doAsyncWrite(AMQMessageStore.java:330)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore.asyncWrite(AMQMessageStore.java:300)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore$1.iterate(AMQMessageStore.java:92)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:9
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
After that stack trace, I start getting a lot moreNullPointerExceptions in
the logs, however, they don't have a corresponding stack trace:

INFO   | jvm 1    | 2007/11/01 20:30:45 | WARN  AMQMessageStore
- Message could not be added to long term store: null
INFO | jvm 1 | 2007/11/01 20:30:45 |java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:30:45 | WARN  AMQMessageStore
- Message could not be added to long term store: null
INFO | jvm 1 | 2007/11/01 20:30:45 |java.lang.NullPointerException
Then, a different kind of exception, with stack trace:

INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
- Failed to fill batch
INFO | jvm 1 | 2007/11/01 20:31:20 |java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR IG
- Failed to page in more queue messages
INFO   | jvm 1    | 2007/11/01 20:31:20 | java.lang.RuntimeException:
java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.cursors.QueueStorePrefetch.hasNext(QueueStorePrefetch.java:112)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext(StoreQueueCursor.java:129)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.buildList(Queue.java:1022)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1005)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1068)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.iterate(Queue.java:941)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
INFO   | jvm 1    | 2007/11/01 20:31:20 | Caused by:
java.lang.NullPointerException
A couple other stack traces that look a lot like that, and thentons of:
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
- Failed to fill batch
INFO | jvm 1 | 2007/11/01 20:31:20 |java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
- Failed to fill batch
INFO | jvm 1 | 2007/11/01 20:31:20 |java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
- Failed to fill batch
(Sorry, I haven't searched JIRA/Google for this one yet so this maybe a
known issue)

I reran most of the test with the journaledJDBC connector but it's not
nearly as fast as the local store so I'm hoping to get thisworking. :-)I've made the indexBinSize change you've recommended and I'll re-run thetests tomorrow. BTW thanks for putting together that amq-message-store page
on the wiki. Thanks for the assistance!

-Andy









rajdavies wrote:
Hi andy,

did you start with a completely new AMQ Store data directory ? - You
could delete the references store only - e.g. the Recovery section in
http://activemq.apache.org/amq-message-store.html
Also - as you are storing a few million messages - you might want to
change the number of indexBinSize - the default number of has bins -
parameter to something large - like 16000 (the default is 1024).

I'l see if I can reproduce your latest problem with the
RecoveryListenerAdaptor.

cheers,

Rob

http://rajdavies.blogspot.com/



On Nov 2, 2007, at 2:16 AM, Andrew Miller wrote:
Thanks for the fix!  We're running the test now...  Parameters:

SVN version: 591163
Client jar: Updated to 5.0-20071028.173739

We haven't seen any of the original IndexOutOfBounds exception
messages,
even after about 1.2 million Enqueues.   However, the consumers
appear to be
stuck and unable to consume.     We had a RecoveryListenerAdapter
message
show up in the log, though I'm not sure this correlates with the
inability
of the consumers to consume these messages.

2007-11-01 18:10:42,645 [68.194.21:40095] ERROR
RecoveryListenerAdapter
- Message id ID:rlab-0007.xxxx.com-48174-119396578373
6-0:2:1:1:11 could not be recovered from the data store!

I even wrote a little perl client that does nothing but subscribe
to this
queue, consumes one message, and exits.  The script just blocks and
then
times out, even though there are many messages in the queue:

rlab-0007:/opt/activemq/bin# ./query -QQueue=IG
ACTIVEMQ_HOME: /opt/activemq
ACTIVEMQ_BASE: /opt/activemq
Type = Queue
DispatchCount = 213
Destination = IG
MaxEnqueueTime = 1371
QueueSize = 926441
Name = IG
DequeueCount = 213
MemoryPercentageUsed = 0
ConsumerCount = 19
MemoryLimit = 30198988
EnqueueCount = 1190424
MinEnqueueTime = 1
AverageEnqueueTime = 196.80751173708921

Here's the one-off perl script and the output: (I assume there's no
reason I
would be unable to consume a message sent via JMS in stomp/perl?)
(redacting script since my previous mail on this subject got
rejected for
being too spammy - suffice to say it simply subscribes to thequeue in
question and does a stomp->receive_frame())

[EMAIL PROTECTED]:/opt/amiller$ ./c1.pl
connected
could not receive message after 20 seconds at ./c1.pl line 17,
<GEN0> line
4.

We also had this problem with the 20071031 snapshot, but at that
time I
thought it was because we were using the 4.1.1 client jars.  BTW
even though
we didn't see this issue in 20071002, we needed to upgrade to solve
AMQ-1479.

Thanks again!
-Andy




rajdavies wrote:
There's a potential fix for this I've just added to trunk -would it
be possible to build the latest and test ?


cheers,

Rob

http://rajdavies.blogspot.com/



On Nov 1, 2007, at 9:47 PM, Andrew Miller wrote:
We've been using ActiveMQ in an infrastructure refresh project,and
have run
into a nasty looking bug, it looks like we are dropping
messages.    We have
a few fast producers that will produce approximately 4-5 million
persistent
messages over the course of about 2.5 hours, while many slower
consumers
will consume these messages over the course of a day or so.

We are using Jencks/Spring for the producers and consumers, using
the
openwire protocol.  For this test, we were using the AMQ 4.1.1
client jars,
if that matters. (We will try updating to the new AMQ5 clientjars
today)

About halfway through the production run, we started getting
IndexOutOfBounds exceptions in the log pretty frequently.Later on
the
consumer tries to get these messages but appears to fail.
(exception messages trimmed)
--
View this message in context: http://www.nabble.com/
IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739-
tf4734264s2354.html#a13541167
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
--
View this message in context: http://www.nabble.com/IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739-tf4734264s2354.html#a13543207
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: IndexOutOfBoundsException in AMQMessageStore using 20071028.173739

Reply via email to