Re: IndexOutOfBoundsException in AMQMessageStore using 20071028.173739

Andrew Miller Tue, 06 Nov 2007 19:27:56 -0800


Rob,


We did try again with a few issues.  For now, we've added some state
tracking to our database schema so that we can regenerate items that should
be in a particular state - therefore we've been able to remove the
requirement for a persistent store.  If activemq crashes, we can regenerate
the queues by checking the state in the database.  Unfortunately the logs
have rolled since that time so I don't have any results to report. :-(

Right now the project is nearing completion so it will be a while before I
can do some more load/stability testing using the AMQStore - however once
things cool down around here I'd love to help out all I can in making AMQ
better - we're impressed so far with the features and performance,
especially after 2 years of fighting with custom "NIH" framework code.  :-)

-Andy



rajdavies wrote:
> 
> I wonder if you could take the latest again?
> 
> 
> cheers,
> 
> Rob
> 
> http://rajdavies.blogspot.com/
> 
> 
> 
> On Nov 2, 2007, at 7:26 AM, Andrew Miller wrote:
> 
>>
>>
>> Rob,
>>
>> Yeah, in general, I've been deleting the entire local store  
>> directory in
>> between tests, since we're just testing it is not critical that the  
>> messages
>> are persisted. (with the message queue down of course)   
>> Interestingly, I
>> restarted the broker to try to fix this problem.  It marked the  
>> store as bad
>> and regenerated the index from the journal (about an hour+ with  
>> ~1.5-2m
>> messages).
>>
>> The consumers started working again at that point, for a while.   
>> After it
>> reached ~7200 dispatched messages, it really blew up.  I got this  
>> in the
>> wrapper.log:
>>
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |  
>> java.lang.NullPointerException
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.MessageIdMarshaller.writePayload 
>> (MessageIdMarshaller.java:38)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.MessageIdMarshaller.writePayload 
>> (MessageIdMarshaller.java:29)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.impl.index.hash.HashEntry.write 
>> (HashEntry.java:93)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.impl.index.hash.HashPage.write 
>> (HashPage.java:123)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.impl.index.hash.HashIndex.writeFullPage 
>> (HashIndex.java:350)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.impl.index.hash.HashPageInfo.end 
>> (HashPageInfo.java:98)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.impl.index.hash.HashBin.end(HashBin.java:299)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.impl.index.hash.HashBin.put(HashBin.java:153)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.impl.index.hash.HashIndex.store 
>> (HashIndex.java:241)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.kaha.impl.container.MapContainerImpl.put 
>> (MapContainerImpl.java:303)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.store.kahadaptor.KahaReferenceStore.addMessageRefe 
>> rence(KahaReferenceStore.java:113)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.store.amq.AMQMessageStore$4.execute 
>> (AMQMessageStore.java:341)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.util.TransactionTemplate.run 
>> (TransactionTemplate.java:44)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.store.amq.AMQMessageStore.doAsyncWrite 
>> (AMQMessageStore.java:330)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.store.amq.AMQMessageStore.asyncWrite 
>> (AMQMessageStore.java:300)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.store.amq.AMQMessageStore$1.iterate 
>> (AMQMessageStore.java:92)
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.thread.DedicatedTaskRunner.runTask 
>> (DedicatedTaskRunner.java:9
>> INFO   | jvm 1    | 2007/11/01 20:30:09 |       at
>> org.apache.activemq.thread.DedicatedTaskRunner$1.run 
>> (DedicatedTaskRunner.java:36)
>>
>> After that stack trace, I start getting a lot more  
>> NullPointerExceptions in
>> the logs, however, they don't have a corresponding stack trace:
>>
>> INFO   | jvm 1    | 2007/11/01 20:30:45 | WARN  AMQMessageStore
>> - Message could not be added to long term store: null
>> INFO   | jvm 1    | 2007/11/01 20:30:45 |  
>> java.lang.NullPointerException
>> INFO   | jvm 1    | 2007/11/01 20:30:45 | WARN  AMQMessageStore
>> - Message could not be added to long term store: null
>> INFO   | jvm 1    | 2007/11/01 20:30:45 |  
>> java.lang.NullPointerException
>>
>> Then, a different kind of exception, with stack trace:
>>
>> INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
>> - Failed to fill batch
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |  
>> java.lang.NullPointerException
>> INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR IG
>> - Failed to page in more queue messages
>> INFO   | jvm 1    | 2007/11/01 20:31:20 | java.lang.RuntimeException:
>> java.lang.NullPointerException
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
>> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.hasNext 
>> (QueueStorePrefetch.java:112)
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
>> org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext 
>> (StoreQueueCursor.java:129)
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
>> org.apache.activemq.broker.region.Queue.buildList(Queue.java:1022)
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
>> org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1005)
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
>> org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java: 
>> 1068)
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
>> org.apache.activemq.broker.region.Queue.iterate(Queue.java:941)
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
>> org.apache.activemq.thread.DedicatedTaskRunner.runTask 
>> (DedicatedTaskRunner.java:98)
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |       at
>> org.apache.activemq.thread.DedicatedTaskRunner$1.run 
>> (DedicatedTaskRunner.java:36)
>> INFO   | jvm 1    | 2007/11/01 20:31:20 | Caused by:
>> java.lang.NullPointerException
>>
>> A couple other stack traces that look a lot like that, and then  
>> tons of:
>>
>> INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
>> - Failed to fill batch
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |  
>> java.lang.NullPointerException
>> INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
>> - Failed to fill batch
>> INFO   | jvm 1    | 2007/11/01 20:31:20 |  
>> java.lang.NullPointerException
>> INFO   | jvm 1    | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch
>> - Failed to fill batch
>>
>>
>> (Sorry, I haven't searched JIRA/Google for this one yet so this may  
>> be a
>> known issue)
>>
>> I reran most of the test with the journaledJDBC connector but it's not
>> nearly as fast as the local store so I'm hoping to get this  
>> working.  :-)
>> I've made the indexBinSize change you've recommended and I'll re- 
>> run the
>> tests tomorrow.  BTW thanks for putting together that amq-message- 
>> store page
>> on the wiki. Thanks for the assistance!
>>
>> -Andy
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> rajdavies wrote:
>>>
>>> Hi andy,
>>>
>>> did you start with a completely new AMQ Store data directory ? - You
>>> could delete the references store only - e.g. the Recovery section in
>>> http://activemq.apache.org/amq-message-store.html
>>> Also - as you are storing a few million messages - you might want to
>>> change the number of indexBinSize - the default number of has bins -
>>> parameter to something large - like 16000 (the default is 1024).
>>>
>>> I'l see if I can reproduce your latest problem with the
>>> RecoveryListenerAdaptor.
>>>
>>> cheers,
>>>
>>> Rob
>>>
>>> http://rajdavies.blogspot.com/
>>>
>>>
>>>
>>> On Nov 2, 2007, at 2:16 AM, Andrew Miller wrote:
>>>
>>>>
>>>>
>>>> Thanks for the fix!  We're running the test now...  Parameters:
>>>>
>>>> SVN version: 591163
>>>> Client jar: Updated to 5.0-20071028.173739
>>>>
>>>> We haven't seen any of the original IndexOutOfBounds exception
>>>> messages,
>>>> even after about 1.2 million Enqueues.   However, the consumers
>>>> appear to be
>>>> stuck and unable to consume.     We had a RecoveryListenerAdapter
>>>> message
>>>> show up in the log, though I'm not sure this correlates with the
>>>> inability
>>>> of the consumers to consume these messages.
>>>>
>>>> 2007-11-01 18:10:42,645 [68.194.21:40095] ERROR
>>>> RecoveryListenerAdapter
>>>> - Message id ID:rlab-0007.xxxx.com-48174-119396578373
>>>> 6-0:2:1:1:11 could not be recovered from the data store!
>>>>
>>>> I even wrote a little perl client that does nothing but subscribe
>>>> to this
>>>> queue, consumes one message, and exits.  The script just blocks and
>>>> then
>>>> times out, even though there are many messages in the queue:
>>>>
>>>> rlab-0007:/opt/activemq/bin# ./query -QQueue=IG
>>>> ACTIVEMQ_HOME: /opt/activemq
>>>> ACTIVEMQ_BASE: /opt/activemq
>>>> Type = Queue
>>>> DispatchCount = 213
>>>> Destination = IG
>>>> MaxEnqueueTime = 1371
>>>> QueueSize = 926441
>>>> Name = IG
>>>> DequeueCount = 213
>>>> MemoryPercentageUsed = 0
>>>> ConsumerCount = 19
>>>> MemoryLimit = 30198988
>>>> EnqueueCount = 1190424
>>>> MinEnqueueTime = 1
>>>> AverageEnqueueTime = 196.80751173708921
>>>>
>>>> Here's the one-off perl script and the output: (I assume there's no
>>>> reason I
>>>> would be unable to consume a message sent via JMS in stomp/perl?)
>>>> (redacting script since my previous mail on this subject got
>>>> rejected for
>>>> being too spammy - suffice to say it simply subscribes to the  
>>>> queue in
>>>> question and does a stomp->receive_frame())
>>>>
>>>> [EMAIL PROTECTED]:/opt/amiller$ ./c1.pl
>>>> connected
>>>> could not receive message after 20 seconds at ./c1.pl line 17,
>>>> <GEN0> line
>>>> 4.
>>>>
>>>> We also had this problem with the 20071031 snapshot, but at that
>>>> time I
>>>> thought it was because we were using the 4.1.1 client jars.  BTW
>>>> even though
>>>> we didn't see this issue in 20071002, we needed to upgrade to solve
>>>> AMQ-1479.
>>>>
>>>> Thanks again!
>>>> -Andy
>>>>
>>>>
>>>>
>>>>
>>>> rajdavies wrote:
>>>>>
>>>>> There's a potential fix for this I've just added to trunk -  
>>>>> would it
>>>>> be possible to build the latest and test ?
>>>>>
>>>>>
>>>>> cheers,
>>>>>
>>>>> Rob
>>>>>
>>>>> http://rajdavies.blogspot.com/
>>>>>
>>>>>
>>>>>
>>>>> On Nov 1, 2007, at 9:47 PM, Andrew Miller wrote:
>>>>>
>>>>>>
>>>>>> We've been using ActiveMQ in an infrastructure refresh project,  
>>>>>> and
>>>>>> have run
>>>>>> into a nasty looking bug, it looks like we are dropping
>>>>>> messages.    We have
>>>>>> a few fast producers that will produce approximately 4-5 million
>>>>>> persistent
>>>>>> messages over the course of about 2.5 hours, while many slower
>>>>>> consumers
>>>>>> will consume these messages over the course of a day or so.
>>>>>>
>>>>>> We are using Jencks/Spring for the producers and consumers, using
>>>>>> the
>>>>>> openwire protocol.  For this test, we were using the AMQ 4.1.1
>>>>>> client jars,
>>>>>> if that matters.  (We will try updating to the new AMQ5 client  
>>>>>> jars
>>>>>> today)
>>>>>>
>>>>>> About halfway through the production run, we started getting
>>>>>> IndexOutOfBounds exceptions in the log pretty frequently.   
>>>>>> Later on
>>>>>> the
>>>>>> consumer tries to get these messages but appears to fail.
>>>>>
>>>>> (exception messages trimmed)
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/
>>>> IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739-
>>>> tf4734264s2354.html#a13541167
>>>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/ 
>> IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739- 
>> tf4734264s2354.html#a13543207
>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739-tf4734264s2354.html#a13620710
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: IndexOutOfBoundsException in AMQMessageStore using 20071028.173739

Reply via email to