Rob,
We did try again with a few issues. For now, we've added some state tracking to our database schema so that we can regenerate items that should be in a particular state - therefore we've been able to remove the requirement for a persistent store. If activemq crashes, we can regenerate the queues by checking the state in the database. Unfortunately the logs have rolled since that time so I don't have any results to report. :-( Right now the project is nearing completion so it will be a while before I can do some more load/stability testing using the AMQStore - however once things cool down around here I'd love to help out all I can in making AMQ better - we're impressed so far with the features and performance, especially after 2 years of fighting with custom "NIH" framework code. :-) -Andy rajdavies wrote: > > I wonder if you could take the latest again? > > > cheers, > > Rob > > http://rajdavies.blogspot.com/ > > > > On Nov 2, 2007, at 7:26 AM, Andrew Miller wrote: > >> >> >> Rob, >> >> Yeah, in general, I've been deleting the entire local store >> directory in >> between tests, since we're just testing it is not critical that the >> messages >> are persisted. (with the message queue down of course) >> Interestingly, I >> restarted the broker to try to fix this problem. It marked the >> store as bad >> and regenerated the index from the journal (about an hour+ with >> ~1.5-2m >> messages). >> >> The consumers started working again at that point, for a while. >> After it >> reached ~7200 dispatched messages, it really blew up. I got this >> in the >> wrapper.log: >> >> INFO | jvm 1 | 2007/11/01 20:30:09 | >> java.lang.NullPointerException >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.MessageIdMarshaller.writePayload >> (MessageIdMarshaller.java:38) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.MessageIdMarshaller.writePayload >> (MessageIdMarshaller.java:29) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.impl.index.hash.HashEntry.write >> (HashEntry.java:93) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.impl.index.hash.HashPage.write >> (HashPage.java:123) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.impl.index.hash.HashIndex.writeFullPage >> (HashIndex.java:350) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.impl.index.hash.HashPageInfo.end >> (HashPageInfo.java:98) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.impl.index.hash.HashBin.end(HashBin.java:299) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.impl.index.hash.HashBin.put(HashBin.java:153) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.impl.index.hash.HashIndex.store >> (HashIndex.java:241) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.kaha.impl.container.MapContainerImpl.put >> (MapContainerImpl.java:303) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.store.kahadaptor.KahaReferenceStore.addMessageRefe >> rence(KahaReferenceStore.java:113) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.store.amq.AMQMessageStore$4.execute >> (AMQMessageStore.java:341) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.util.TransactionTemplate.run >> (TransactionTemplate.java:44) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.store.amq.AMQMessageStore.doAsyncWrite >> (AMQMessageStore.java:330) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.store.amq.AMQMessageStore.asyncWrite >> (AMQMessageStore.java:300) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.store.amq.AMQMessageStore$1.iterate >> (AMQMessageStore.java:92) >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.thread.DedicatedTaskRunner.runTask >> (DedicatedTaskRunner.java:9 >> INFO | jvm 1 | 2007/11/01 20:30:09 | at >> org.apache.activemq.thread.DedicatedTaskRunner$1.run >> (DedicatedTaskRunner.java:36) >> >> After that stack trace, I start getting a lot more >> NullPointerExceptions in >> the logs, however, they don't have a corresponding stack trace: >> >> INFO | jvm 1 | 2007/11/01 20:30:45 | WARN AMQMessageStore >> - Message could not be added to long term store: null >> INFO | jvm 1 | 2007/11/01 20:30:45 | >> java.lang.NullPointerException >> INFO | jvm 1 | 2007/11/01 20:30:45 | WARN AMQMessageStore >> - Message could not be added to long term store: null >> INFO | jvm 1 | 2007/11/01 20:30:45 | >> java.lang.NullPointerException >> >> Then, a different kind of exception, with stack trace: >> >> INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch >> - Failed to fill batch >> INFO | jvm 1 | 2007/11/01 20:31:20 | >> java.lang.NullPointerException >> INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR IG >> - Failed to page in more queue messages >> INFO | jvm 1 | 2007/11/01 20:31:20 | java.lang.RuntimeException: >> java.lang.NullPointerException >> INFO | jvm 1 | 2007/11/01 20:31:20 | at >> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.hasNext >> (QueueStorePrefetch.java:112) >> INFO | jvm 1 | 2007/11/01 20:31:20 | at >> org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext >> (StoreQueueCursor.java:129) >> INFO | jvm 1 | 2007/11/01 20:31:20 | at >> org.apache.activemq.broker.region.Queue.buildList(Queue.java:1022) >> INFO | jvm 1 | 2007/11/01 20:31:20 | at >> org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1005) >> INFO | jvm 1 | 2007/11/01 20:31:20 | at >> org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java: >> 1068) >> INFO | jvm 1 | 2007/11/01 20:31:20 | at >> org.apache.activemq.broker.region.Queue.iterate(Queue.java:941) >> INFO | jvm 1 | 2007/11/01 20:31:20 | at >> org.apache.activemq.thread.DedicatedTaskRunner.runTask >> (DedicatedTaskRunner.java:98) >> INFO | jvm 1 | 2007/11/01 20:31:20 | at >> org.apache.activemq.thread.DedicatedTaskRunner$1.run >> (DedicatedTaskRunner.java:36) >> INFO | jvm 1 | 2007/11/01 20:31:20 | Caused by: >> java.lang.NullPointerException >> >> A couple other stack traces that look a lot like that, and then >> tons of: >> >> INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch >> - Failed to fill batch >> INFO | jvm 1 | 2007/11/01 20:31:20 | >> java.lang.NullPointerException >> INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch >> - Failed to fill batch >> INFO | jvm 1 | 2007/11/01 20:31:20 | >> java.lang.NullPointerException >> INFO | jvm 1 | 2007/11/01 20:31:20 | ERROR QueueStorePrefetch >> - Failed to fill batch >> >> >> (Sorry, I haven't searched JIRA/Google for this one yet so this may >> be a >> known issue) >> >> I reran most of the test with the journaledJDBC connector but it's not >> nearly as fast as the local store so I'm hoping to get this >> working. :-) >> I've made the indexBinSize change you've recommended and I'll re- >> run the >> tests tomorrow. BTW thanks for putting together that amq-message- >> store page >> on the wiki. Thanks for the assistance! >> >> -Andy >> >> >> >> >> >> >> >> >> >> rajdavies wrote: >>> >>> Hi andy, >>> >>> did you start with a completely new AMQ Store data directory ? - You >>> could delete the references store only - e.g. the Recovery section in >>> http://activemq.apache.org/amq-message-store.html >>> Also - as you are storing a few million messages - you might want to >>> change the number of indexBinSize - the default number of has bins - >>> parameter to something large - like 16000 (the default is 1024). >>> >>> I'l see if I can reproduce your latest problem with the >>> RecoveryListenerAdaptor. >>> >>> cheers, >>> >>> Rob >>> >>> http://rajdavies.blogspot.com/ >>> >>> >>> >>> On Nov 2, 2007, at 2:16 AM, Andrew Miller wrote: >>> >>>> >>>> >>>> Thanks for the fix! We're running the test now... Parameters: >>>> >>>> SVN version: 591163 >>>> Client jar: Updated to 5.0-20071028.173739 >>>> >>>> We haven't seen any of the original IndexOutOfBounds exception >>>> messages, >>>> even after about 1.2 million Enqueues. However, the consumers >>>> appear to be >>>> stuck and unable to consume. We had a RecoveryListenerAdapter >>>> message >>>> show up in the log, though I'm not sure this correlates with the >>>> inability >>>> of the consumers to consume these messages. >>>> >>>> 2007-11-01 18:10:42,645 [68.194.21:40095] ERROR >>>> RecoveryListenerAdapter >>>> - Message id ID:rlab-0007.xxxx.com-48174-119396578373 >>>> 6-0:2:1:1:11 could not be recovered from the data store! >>>> >>>> I even wrote a little perl client that does nothing but subscribe >>>> to this >>>> queue, consumes one message, and exits. The script just blocks and >>>> then >>>> times out, even though there are many messages in the queue: >>>> >>>> rlab-0007:/opt/activemq/bin# ./query -QQueue=IG >>>> ACTIVEMQ_HOME: /opt/activemq >>>> ACTIVEMQ_BASE: /opt/activemq >>>> Type = Queue >>>> DispatchCount = 213 >>>> Destination = IG >>>> MaxEnqueueTime = 1371 >>>> QueueSize = 926441 >>>> Name = IG >>>> DequeueCount = 213 >>>> MemoryPercentageUsed = 0 >>>> ConsumerCount = 19 >>>> MemoryLimit = 30198988 >>>> EnqueueCount = 1190424 >>>> MinEnqueueTime = 1 >>>> AverageEnqueueTime = 196.80751173708921 >>>> >>>> Here's the one-off perl script and the output: (I assume there's no >>>> reason I >>>> would be unable to consume a message sent via JMS in stomp/perl?) >>>> (redacting script since my previous mail on this subject got >>>> rejected for >>>> being too spammy - suffice to say it simply subscribes to the >>>> queue in >>>> question and does a stomp->receive_frame()) >>>> >>>> [EMAIL PROTECTED]:/opt/amiller$ ./c1.pl >>>> connected >>>> could not receive message after 20 seconds at ./c1.pl line 17, >>>> <GEN0> line >>>> 4. >>>> >>>> We also had this problem with the 20071031 snapshot, but at that >>>> time I >>>> thought it was because we were using the 4.1.1 client jars. BTW >>>> even though >>>> we didn't see this issue in 20071002, we needed to upgrade to solve >>>> AMQ-1479. >>>> >>>> Thanks again! >>>> -Andy >>>> >>>> >>>> >>>> >>>> rajdavies wrote: >>>>> >>>>> There's a potential fix for this I've just added to trunk - >>>>> would it >>>>> be possible to build the latest and test ? >>>>> >>>>> >>>>> cheers, >>>>> >>>>> Rob >>>>> >>>>> http://rajdavies.blogspot.com/ >>>>> >>>>> >>>>> >>>>> On Nov 1, 2007, at 9:47 PM, Andrew Miller wrote: >>>>> >>>>>> >>>>>> We've been using ActiveMQ in an infrastructure refresh project, >>>>>> and >>>>>> have run >>>>>> into a nasty looking bug, it looks like we are dropping >>>>>> messages. We have >>>>>> a few fast producers that will produce approximately 4-5 million >>>>>> persistent >>>>>> messages over the course of about 2.5 hours, while many slower >>>>>> consumers >>>>>> will consume these messages over the course of a day or so. >>>>>> >>>>>> We are using Jencks/Spring for the producers and consumers, using >>>>>> the >>>>>> openwire protocol. For this test, we were using the AMQ 4.1.1 >>>>>> client jars, >>>>>> if that matters. (We will try updating to the new AMQ5 client >>>>>> jars >>>>>> today) >>>>>> >>>>>> About halfway through the production run, we started getting >>>>>> IndexOutOfBounds exceptions in the log pretty frequently. >>>>>> Later on >>>>>> the >>>>>> consumer tries to get these messages but appears to fail. >>>>> >>>>> (exception messages trimmed) >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/ >>>> IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739- >>>> tf4734264s2354.html#a13541167 >>>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com. >>>> >>> >>> >>> >> >> -- >> View this message in context: http://www.nabble.com/ >> IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739- >> tf4734264s2354.html#a13543207 >> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com. >> > > > -- View this message in context: http://www.nabble.com/IndexOutOfBoundsException-in-AMQMessageStore-using-20071028.173739-tf4734264s2354.html#a13620710 Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
