Jesse Fugitt created AMQ-5440:
---------------------------------
Summary: KahaDB error at startup "Looking for key N but not found
in fileMap"
Key: AMQ-5440
URL: https://issues.apache.org/jira/browse/AMQ-5440
Project: ActiveMQ
Issue Type: Bug
Components: Message Store
Affects Versions: 5.10.0
Reporter: Jesse Fugitt
Priority: Critical
After being shutdown uncleanly, KahaDB can hit a startup error at times that
causes the broker to fail to start up and potentially causes messages to be
re-assigned that are not marked as redelivered.
The log message at startup is:
2014-11-17 11:10:36,826 | ERROR | Looking for key 275 but not found in fileMap:
{305=db-305.log number = 305 , length = 8217, 304=db-304.log number = 304 ,
length = 8217, 307=db-307.log number = 307 , length = 8217, 306=db-306.log
number = 306 , length = 8217, 309=db-309.log number = 309 , length = 8217,
308=db-308.log number = 308 , length = 8217, 311=db-311.log number = 311 ,
length = 8217, 310=db-310.log number = 310 , length = 8217, 313=db-313.log
number = 313 , length = 8217, 312=db-312.log number = 312 , length = 8217,
314=db-314.log number = 314 , length = 317, 303=db-303.log number = 303 ,
length = 8433} | org.apache.activemq.store.kahadb.disk.journal.Journal | main
and the stack trace is:
Starting TestApp...
INFO | KahaDB is version 5
ERROR | Looking for key 275 but not found in fileMap: {305=db-305.log number =
305 , length = 8217, 304=db-304.log number = 304 , length = 8217,
307=db-307.log number = 307 , length = 8217, 306=db-306.log number = 306 ,
length = 8217, 309=db-309.log number = 309 , length = 8217, 308=db-308.log
number = 308 , length = 8217, 311=db-311.log number = 311 , length = 8217,
310=db-310.log number = 310 , length = 8217, 313=db-313.log number = 313 ,
length = 8217, 312=db-312.log number = 312 , length = 8217, 314=db-314.log
number = 314 , length = 317, 303=db-303.log number = 303 , length = 8433}
Exception in thread "main" java.io.IOException: Could not locate data file
KahaDB\db-275.log
at
org.apache.activemq.store.kahadb.disk.journal.Journal.getDataFile(Journal.java:353)
at
org.apache.activemq.store.kahadb.disk.journal.Journal.read(Journal.java:600)
at
org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:1014)
at
org.apache.activemq.store.kahadb.MessageDatabase.recoverProducerAudit(MessageDatabase.java:687)
at
org.apache.activemq.store.kahadb.MessageDatabase.recover(MessageDatabase.java:595)
at
org.apache.activemq.store.kahadb.MessageDatabase.open(MessageDatabase.java:400)
at
org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:418)
at
org.apache.activemq.store.kahadb.MessageDatabase.doStart(MessageDatabase.java:262)
at
org.apache.activemq.store.kahadb.KahaDBStore.doStart(KahaDBStore.java:194)
at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
at
org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStart(KahaDBPersistenceAdapter.java:215)
at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
at kahadbtest.TestApp.run(TestApp.java:29)
at kahadbtest.TestApp.main(TestApp.java:21)
This was fairly hard to reproduce without unclean shutdown but the attached log
and "broken" KahaDB folder should help debug the problem. Also, I will attach
the small test app that exercises the KahaDB APIs that I was using to cause the
invalid state (I normally start and stop the app a few times until the problem
appears at startup at which point it will no longer start).
Some initial debugging looks like it might be related to the way that message
acks are stored via the metadata serialization and how that interacts with the
GC timer but I didn't see anything obvious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)