I wonder if you could take the latest again?

cheers,

Rob

http://rajdavies.blogspot.com/



On Nov 2, 2007, at 7:26 AM, Andrew Miller wrote:



Rob,

Yeah, in general, I've been deleting the entire local store directory in between tests, since we're just testing it is not critical that the messages are persisted. (with the message queue down of course) Interestingly, I restarted the broker to try to fix this problem. It marked the store as bad and regenerated the index from the journal (about an hour+ with ~1.5-2m
messages).

The consumers started working again at that point, for a while. After it reached ~7200 dispatched messages, it really blew up. I got this in the
wrapper.log:

INFO | jvm 1 | 2007/11/01 20:30:09 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.MessageIdMarshaller.writePayload (MessageIdMarshaller.java:38)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.MessageIdMarshaller.writePayload (MessageIdMarshaller.java:29)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashEntry.write (HashEntry.java:93)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashPage.write (HashPage.java:123)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashIndex.writeFullPage (HashIndex.java:350)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashPageInfo.end (HashPageInfo.java:98)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashBin.end(HashBin.java:299)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashBin.put(HashBin.java:153)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.index.hash.HashIndex.store (HashIndex.java:241)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.kaha.impl.container.MapContainerImpl.put (MapContainerImpl.java:303)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.kahadaptor.KahaReferenceStore.addMessageRefe rence(KahaReferenceStore.java:113)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore$4.execute (AMQMessageStore.java:341)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.util.TransactionTemplate.run (TransactionTemplate.java:44)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore.doAsyncWrite (AMQMessageStore.java:330)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore.asyncWrite (AMQMessageStore.java:300)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.store.amq.AMQMessageStore$1.iterate (AMQMessageStore.java:92)
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.thread.DedicatedTaskRunner.runTask (DedicatedTaskRunner.java:9
INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
org.apache.activemq.thread.DedicatedTaskRunner$1.run (DedicatedTaskRunner.java:36)

After that stack trace, I start getting a lot more NullPointerExceptions in
the logs, however, they don't have a corresponding stack trace:

INFO   | jvm 1    | 2007/11/01 20:30:45 | WARN  AMQMessageStore
- Message could not be added to long term store: null
INFO | jvm 1 | 2007/11/01 20:30:45 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:30:45 | WARN  AMQMessageStore
- Message could not be added to long term store: null
INFO | jvm 1 | 2007/11/01 20:30:45 | java.lang.NullPointerException

Then, a different kind of exception, with stack trace:

INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
- Failed to fill batch
INFO | jvm 1 | 2007/11/01 20:31:20 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR IG
- Failed to page in more queue messages
INFO   | jvm 1    | 2007/11/01 20:31:20 | java.lang.RuntimeException:
java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.cursors.QueueStorePrefetch.hasNext (QueueStorePrefetch.java:112)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext (StoreQueueCursor.java:129)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.buildList(Queue.java:1022)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1005)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java: 1068)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.broker.region.Queue.iterate(Queue.java:941)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.thread.DedicatedTaskRunner.runTask (DedicatedTaskRunner.java:98)
INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
org.apache.activemq.thread.DedicatedTaskRunner$1.run (DedicatedTaskRunner.java:36)
INFO   | jvm 1    | 2007/11/01 20:31:20 | Caused by:
java.lang.NullPointerException

A couple other stack traces that look a lot like that, and then tons of:

INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
- Failed to fill batch
INFO | jvm 1 | 2007/11/01 20:31:20 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
- Failed to fill batch
INFO | jvm 1 | 2007/11/01 20:31:20 | java.lang.NullPointerException
INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
- Failed to fill batch


(Sorry, I haven't searched JIRA/Google for this one yet so this may be a
known issue)

I reran most of the test with the journaledJDBC connector but it's not
nearly as fast as the local store so I'm hoping to get this working. :-) I've made the indexBinSize change you've recommended and I'll re- run the tests tomorrow. BTW thanks for putting together that amq-message- store page
on the wiki. Thanks for the assistance!

-Andy









rajdavies wrote:

Hi andy,

did you start with a completely new AMQ Store data directory ? - You
could delete the references store only - e.g. the Recovery section in
http://activemq.apache.org/amq-message-store.html
Also - as you are storing a few million messages - you might want to
change the number of indexBinSize - the default number of has bins -
parameter to something large - like 16000 (the default is 1024).

I'l see if I can reproduce your latest problem with the
RecoveryListenerAdaptor.

cheers,

Rob

http://rajdavies.blogspot.com/



On Nov 2, 2007, at 2:16 AM, Andrew Miller wrote:



Thanks for the fix!  We're running the test now...  Parameters:

SVN version: 591163
Client jar: Updated to 5.0-20071028.173739

We haven't seen any of the original IndexOutOfBounds exception
messages,
even after about 1.2 million Enqueues.   However, the consumers
appear to be
stuck and unable to consume.     We had a RecoveryListenerAdapter
message
show up in the log, though I'm not sure this correlates with the
inability
of the consumers to consume these messages.

2007-11-01 18:10:42,645 [68.194.21:40095] ERROR
RecoveryListenerAdapter
- Message id ID:rlab-0007.xxxx.com-48174-119396578373
6-0:2:1:1:11 could not be recovered from the data store!

I even wrote a little perl client that does nothing but subscribe
to this
queue, consumes one message, and exits.  The script just blocks and
then
times out, even though there are many messages in the queue:

rlab-0007:/opt/activemq/bin# ./query -QQueue=IG
ACTIVEMQ_HOME: /opt/activemq
ACTIVEMQ_BASE: /opt/activemq
Type = Queue
DispatchCount = 213
Destination = IG
MaxEnqueueTime = 1371
QueueSize = 926441
Name = IG
DequeueCount = 213
MemoryPercentageUsed = 0
ConsumerCount = 19
MemoryLimit = 30198988
EnqueueCount = 1190424
MinEnqueueTime = 1
AverageEnqueueTime = 196.80751173708921

Here's the one-off perl script and the output: (I assume there's no
reason I
would be unable to consume a message sent via JMS in stomp/perl?)
(redacting script since my previous mail on this subject got
rejected for
being too spammy - suffice to say it simply subscribes to the queue in
question and does a stomp->receive_frame())

[EMAIL PROTECTED]:/opt/amiller$ ./c1.pl
connected
could not receive message after 20 seconds at ./c1.pl line 17,
<GEN0> line
4.

We also had this problem with the 20071031 snapshot, but at that
time I
thought it was because we were using the 4.1.1 client jars.  BTW
even though
we didn't see this issue in 20071002, we needed to upgrade to solve
AMQ-1479.

Thanks again!
-Andy




rajdavies wrote:

There's a potential fix for this I've just added to trunk - would it
be possible to build the latest and test ?


cheers,

Rob

http://rajdavies.blogspot.com/



On Nov 1, 2007, at 9:47 PM, Andrew Miller wrote:


We've been using ActiveMQ in an infrastructure refresh project, and
have run
into a nasty looking bug, it looks like we are dropping
messages.    We have
a few fast producers that will produce approximately 4-5 million
persistent
messages over the course of about 2.5 hours, while many slower
consumers
will consume these messages over the course of a day or so.

We are using Jencks/Spring for the producers and consumers, using
the
openwire protocol.  For this test, we were using the AMQ 4.1.1
client jars,
if that matters. (We will try updating to the new AMQ5 client jars
today)

About halfway through the production run, we started getting
IndexOutOfBounds exceptions in the log pretty frequently. Later on
the
consumer tries to get these messages but appears to fail.

(exception messages trimmed)



--
View this message in context: http://www.nabble.com/
IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739-
tf4734264s2354.html#a13541167
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.





--
View this message in context: http://www.nabble.com/ IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739- tf4734264s2354.html#a13543207
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.


Reply via email to