[jira] [Commented] (AMQ-5440) KahaDB error at startup "Looking for key N but not found in fileMap"

Jesse Fugitt (JIRA) Thu, 20 Nov 2014 13:51:48 -0800

    [ 
https://issues.apache.org/jira/browse/AMQ-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220075#comment-14220075
 ]


Jesse Fugitt commented on AMQ-5440:
-----------------------------------

Debugged this a little more but the root cause isn't obvious even after adding 
more trace debugging.  Basically, the metadata that is written to disk at the 
time the checkpoint cleanup was done looks correct and would not cause a 
startup error because it is not referencing the files that are being cleaned 
and deleted from disk.  However, upon restart, a different set of metadata is 
loaded from disk which is old and still references a file that was deleted 
during the last checkpoint cleanup which causes the exception during 
open/recover as shown in the stack trace.  One workaround I tried was to force 
a rebuild of the index through journal replay for any exception thrown by the 
recover method (as shown below) but it doesn't feel like this is really getting 
to the root of the problem.

//inside the open method:
public void open() throws IOException {
...
           startCheckpoint();
           //recover //instead try to catch exceptions and rebuild the index
            try {
                recover();
            } catch (IOException ex) {
                LOG.warn("Recovery failure during open. Recovering the index 
through journal replay.", ex);
                // try to recover index
                try {
                    pageFile.unload();
                } catch (Exception ignore) {}
                if (archiveCorruptedIndex) {
                    pageFile.archive();
                } else {
                    pageFile.delete();
                }
                metadata = createMetadata();
                pageFile = null;
                loadPageFile();
                recover();
            }

> KahaDB error at startup "Looking for key N but not found in fileMap"
> --------------------------------------------------------------------
>
>                 Key: AMQ-5440
>                 URL: https://issues.apache.org/jira/browse/AMQ-5440
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Message Store
>    Affects Versions: 5.10.0
>            Reporter: Jesse Fugitt
>            Priority: Critical
>         Attachments: KahaDB.zip, TestApp.java, kahadbtest.log
>
>
> After being shutdown uncleanly, KahaDB can hit a startup error at times that 
> causes the broker to fail to start up and potentially causes messages to be 
> re-assigned that are not marked as redelivered.
> The log message at startup is:
> 2014-11-17 11:10:36,826 | ERROR | Looking for key 275 but not found in 
> fileMap: {305=db-305.log number = 305 , length = 8217, 304=db-304.log number 
> = 304 , length = 8217, 307=db-307.log number = 307 , length = 8217, 
> 306=db-306.log number = 306 , length = 8217, 309=db-309.log number = 309 , 
> length = 8217, 308=db-308.log number = 308 , length = 8217, 311=db-311.log 
> number = 311 , length = 8217, 310=db-310.log number = 310 , length = 8217, 
> 313=db-313.log number = 313 , length = 8217, 312=db-312.log number = 312 , 
> length = 8217, 314=db-314.log number = 314 , length = 317, 303=db-303.log 
> number = 303 , length = 8433} | 
> org.apache.activemq.store.kahadb.disk.journal.Journal | main
> and the stack trace is:
> Starting TestApp...
>  INFO | KahaDB is version 5
> ERROR | Looking for key 275 but not found in fileMap: {305=db-305.log number 
> = 305 , length = 8217, 304=db-304.log number = 304 , length = 8217, 
> 307=db-307.log number = 307 , length = 8217, 306=db-306.log number = 306 , 
> length = 8217, 309=db-309.log number = 309 , length = 8217, 308=db-308.log 
> number = 308 , length = 8217, 311=db-311.log number = 311 , length = 8217, 
> 310=db-310.log number = 310 , length = 8217, 313=db-313.log number = 313 , 
> length = 8217, 312=db-312.log number = 312 , length = 8217, 314=db-314.log 
> number = 314 , length = 317, 303=db-303.log number = 303 , length = 8433}
> Exception in thread "main" java.io.IOException: Could not locate data file 
> KahaDB\db-275.log
>       at 
> org.apache.activemq.store.kahadb.disk.journal.Journal.getDataFile(Journal.java:353)
>       at 
> org.apache.activemq.store.kahadb.disk.journal.Journal.read(Journal.java:600)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:1014)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.recoverProducerAudit(MessageDatabase.java:687)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.recover(MessageDatabase.java:595)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.open(MessageDatabase.java:400)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:418)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.doStart(MessageDatabase.java:262)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore.doStart(KahaDBStore.java:194)
>       at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
>       at 
> org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStart(KahaDBPersistenceAdapter.java:215)
>       at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
>       at kahadbtest.TestApp.run(TestApp.java:29)
>       at kahadbtest.TestApp.main(TestApp.java:21)
> This was fairly hard to reproduce without unclean shutdown but the attached 
> log and "broken" KahaDB folder should help debug the problem.  Also, I will 
> attach the small test app that exercises the KahaDB APIs that I was using to 
> cause the invalid state (I normally start and stop the app a few times until 
> the problem appears at startup at which point it will no longer start).
> Some initial debugging looks like it might be related to the way that message 
> acks are stored via the metadata serialization and how that interacts with 
> the GC timer but I didn't see anything obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AMQ-5440) KahaDB error at startup "Looking for key N but not found in fileMap"

Reply via email to