[ 
https://issues.apache.org/jira/browse/AMQ-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578235#comment-13578235
 ] 

Rob Waite commented on AMQ-3725:
--------------------------------

We ran into a similar issue but we a missing some of the exceptions shown in 
the description. We are running 5.7.0

{noformat} 
KahaDB failed to store to Journal | 
org.apache.activemq.store.kahadb.MessageDatabase | 
ConcurrentQueueStoreAndDispatch
java.io.SyncFailedException: sync failed
        at java.io.FileDescriptor.sync(Native Method)
        at 
org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:367)
        at 
org.apache.kahadb.journal.DataFileAppender$1.run(DataFileAppender.java:188)
2013-02-13 17:07:57,427 | INFO  | Stopping the broker due to exception, 
java.io.SyncFailedException: sync failed | 
org.apache.activemq.util.DefaultIOExceptionHandler | 
ConcurrentQueueStoreAndDispatch
java.io.SyncFailedException: sync failed
        at java.io.FileDescriptor.sync(Native Method)
        at 
org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:367)
        at 
org.apache.kahadb.journal.DataFileAppender$1.run(DataFileAppender.java:188)
2013-02-13 17:07:57,517 | INFO  | Apache ActiveMQ 5.7.0 (habroker, 
ID:HOSTNAME-35138-1360038011880-0:1) is shutting down | 
org.apache.activemq.broker.BrokerService | Stopping the broker due to IO 
exception

{noformat} 

After this, AMQ immediately went into shutdown mode. We did not see the 
FileNotFoundException, messages about checkpoint recovery with the IOException 
or the transport failures from EOFException. We did see the transport failures 
on the clients. The Java client also had an EOFException. Our C# client which 
is using NMS and gave the message "Unable to read beyond the end of the stream".

When we restarted AMQ, it would immediately shut down due to the journal files 
missing. We set the parameter on KahaDB in activemq.xml 
ignoreMissingJournalfiles=true. This time AMQ came up and said that it lost 33 
messages.

We then saw a number of errors (they actually show up a number of times with 
maybe three variations almost like multiple loggers are kicking off):

{noformat}
2013-02-13 17:46:14,892 | ERROR | Problem retrieving message for browse | 
org.apache.activemq.broker.region.Queue | ActiveMQ Broker[habroker] Scheduler
java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: 
Invalid location: 131:3886117, : java.lang.NegativeArraySizeException
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:113)
        at 
org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
        at org.apache.activemq.broker.region.Queue.doBrowse(Queue.java:1066)
        at 
org.apache.activemq.broker.region.Queue.expireMessages(Queue.java:832)
        at org.apache.activemq.broker.region.Queue.access$100(Queue.java:98)
        at org.apache.activemq.broker.region.Queue$2.run(Queue.java:138)
        at 
org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
        at java.util.TimerThread.mainLoop(Timer.java:534)
        at java.util.TimerThread.run(Timer.java:484)
Caused by: java.lang.RuntimeException: java.io.IOException: Invalid location: 
131:3886117, : java.lang.NegativeArraySizeException
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:277)
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:110)
        ... 8 more
Caused by: java.io.IOException: Invalid location: 131:3886117, : 
java.lang.NegativeArraySizeException
        at 
org.apache.kahadb.journal.DataFileAccessor.readRecord(DataFileAccessor.java:92)
        at org.apache.kahadb.journal.Journal.read(Journal.java:604)
        at 
org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:879)
        at 
org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1030)
        at 
org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:558)
        at org.apache.kahadb.page.Transaction.execute(Transaction.java:769)
        at 
org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:547)
        at 
org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:106)
        at 
org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:97)
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:274)
        ... 9 more
{noformat}

This all happens during an 8 second period. Then... about 13 minutes later we 
see some more:

{noformat}
2013-02-13 17:59:55,562 | ERROR | Failed to page in more queue messages  | 
org.apache.activemq.broker.region.Queue | ActiveMQ BrokerService[habroker] 
Task-38
java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: 
Invalid location: 125:11971180, : java.lang.NegativeArraySizeException
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.hasNext(AbstractStoreCursor.java:150)
        at 
org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext(StoreQueueCursor.java:131)
        at 
org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1766)
        at 
org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1993)
        at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1486)
        at 
org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
        at 
org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.RuntimeException: java.io.IOException: Invalid location: 
125:11971180, : java.lang.NegativeArraySizeException
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:277)
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.hasNext(AbstractStoreCursor.java:147)
        ... 9 more
Caused by: java.io.IOException: Invalid location: 125:11971180, : 
java.lang.NegativeArraySizeException
        at 
org.apache.kahadb.journal.DataFileAccessor.readRecord(DataFileAccessor.java:92)
        at org.apache.kahadb.journal.Journal.read(Journal.java:604)
        at 
org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:879)
        at 
org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1030)
        at 
org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:558)
        at org.apache.kahadb.page.Transaction.execute(Transaction.java:769)
        at 
org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:547)
        at 
org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:106)
        at 
org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:97)
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:274)
        ... 10 more
{noformat}

We haven't seen any more errors since the one above about 6 hours ago. I just 
pushed 1000 messages through and there were no errors, things seem normal.

We are not currently sure what may have happened to our SAN or connectivity 
during this time. We are still analyzing. Will attempt to recreate to 
understand the best way to mitigate this.
                
> Kahadb error during SAN failover delayed write - Allow kahaDB to recover in a 
> similar manner as the JDBC store using the IOExceptionHandler
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-3725
>                 URL: https://issues.apache.org/jira/browse/AMQ-3725
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Message Store
>    Affects Versions: 5.5.1
>            Reporter: Jason Sherman
>             Fix For: 5.9.0
>
>
> An issue can arise that causes the broker to terminate when using kahaDB with 
> a SAN, when the SAN fails over.  In this case the failover process is 
> seamless however, on fail back there is a 2-3 sec delay where writes are 
> blocked and the broker terminates.  With the JDBC datastore a similar 
> situation can be handled by using the IOExceptionHandler.  However with 
> kahaDB, when this same IOExceptionHandler is added it prevents the broker 
> from terminating but kahaDB retains an invalid index.
> {code}
>  INFO | ActiveMQ JMS Message Broker (Broker1, 
> ID:macbookpro-251a.home-56915-1328715089252-0:1) started
>  INFO | jetty-7.1.6.v20100715
>  INFO | ActiveMQ WebConsole initialized.
>  INFO | Initializing Spring FrameworkServlet 'dispatcher'
>  INFO | ActiveMQ Console at http://0.0.0.0:8161/admin
>  INFO | ActiveMQ Web Demos at http://0.0.0.0:8161/demo
>  INFO | RESTful file access application at http://0.0.0.0:8161/fileserver
>  INFO | FUSE Web Console at http://0.0.0.0:8161/console
>  INFO | Started [email protected]:8161
> ERROR | KahaDB failed to store to Journal
> java.io.SyncFailedException: sync failed
>       at java.io.FileDescriptor.sync(Native Method)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:382)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  INFO | Ignoring IO exception, java.io.SyncFailedException: sync failed
> java.io.SyncFailedException: sync failed
>       at java.io.FileDescriptor.sync(Native Method)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:382)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
> ERROR | Checkpoint failed
> java.io.SyncFailedException: sync failed
>       at java.io.FileDescriptor.sync(Native Method)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:382)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  INFO | Ignoring IO exception, java.io.SyncFailedException: sync failed
> java.io.SyncFailedException: sync failed
>       at java.io.FileDescriptor.sync(Native Method)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:382)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
> ERROR | KahaDB failed to store to Journal
> java.io.FileNotFoundException: /Volumes/NAS-01/data/kahadb/db-1.log (No such 
> file or directory)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
>       at 
> org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.java:70)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:324)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  INFO | Ignoring IO exception, java.io.FileNotFoundException: 
> /Volumes/NAS-01/data/kahadb/db-1.log (No such file or directory)
> java.io.FileNotFoundException: /Volumes/NAS-01/data/kahadb/db-1.log (No such 
> file or directory)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
>       at 
> org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.java:70)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:324)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
> ERROR | KahaDB failed to store to Journal
> java.io.FileNotFoundException: /Volumes/NAS-01/data/kahadb/db-1.log (No such 
> file or directory)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
>       at 
> org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.java:70)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:324)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  INFO | Ignoring IO exception, java.io.FileNotFoundException: 
> /Volumes/NAS-01/data/kahadb/db-1.log (No such file or directory)
> java.io.FileNotFoundException: /Volumes/NAS-01/data/kahadb/db-1.log (No such 
> file or directory)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
>       at 
> org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.java:70)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:324)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  WARN | Transport failed: java.io.EOFException
>  WARN | Transport failed: java.io.EOFException
>  INFO | KahaDB: Recovering checkpoint thread after death
> ERROR | Checkpoint failed
> java.io.IOException: Input/output error
>       at java.io.RandomAccessFile.write(Native Method)
>       at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:1006)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:484)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1290)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$10.execute(MessageDatabase.java:768)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:760)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:766)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:315)
>  INFO | Ignoring IO exception, java.io.IOException: Input/output error
> java.io.IOException: Input/output error
>       at java.io.RandomAccessFile.write(Native Method)
>       at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:1006)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:484)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1290)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$10.execute(MessageDatabase.java:768)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:760)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:766)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:315)
>  INFO | KahaDB: Recovering checkpoint thread after death
> ERROR | Checkpoint failed
> java.io.IOException: Input/output error
>       at java.io.RandomAccessFile.write(Native Method)
>       at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:1006)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:484)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1290)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$10.execute(MessageDatabase.java:768)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:760)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:766)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:315)
>  INFO | Ignoring IO exception, java.io.IOException: Input/output error
> java.io.IOException: Input/output error
>       at java.io.RandomAccessFile.write(Native Method)
>       at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:1006)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:484)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1290)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$10.execute(MessageDatabase.java:768)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:760)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:766)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:315)
>  WARN | Transport failed: java.io.EOFException
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to