[ 
https://issues.apache.org/jira/browse/AMQ-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578646#comment-13578646
 ] 

Rob Waite commented on AMQ-3725:
--------------------------------

We made a build of 5.7.0 with the stopStartConnectors flag set to true in 
DefaultIOExceptionHandler. Locally, I used a thumb drive as the datadirectory 
and would get producers and consumers working and then yank it. We reproduced a 
stack trace a bit more similar to above but still a bit different.

{noformat}
2013-02-14 11:16:16,032 | ERROR | KahaDB failed to store to Journal | 
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Transport: 
tcp:///0:0:0:0:0:0:0:1:54836@61616
java.io.SyncFailedException: sync failed
                at java.io.FileDescriptor.sync(Native Method)
                at 
org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:367)
                at 
org.apache.kahadb.journal.DataFileAppender$1.run(DataFileAppender.java:188)
2013-02-14 11:16:16,035 | INFO  | Initiating stop/restart of broker transport 
due to IO exception, java.io.SyncFailedException: sync failed | 
org.apache.activemq.util.DefaultIOExceptionHandler | ActiveMQ Transport: 
tcp:///0:0:0:0:0:0:0:1:54836@61616
java.io.SyncFailedException: sync failed
                at java.io.FileDescriptor.sync(Native Method)
                at 
org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:367)
                at 
org.apache.kahadb.journal.DataFileAppender$1.run(DataFileAppender.java:188)
2013-02-14 11:16:16,041 | INFO  | waiting for broker persistence adapter 
checkpoint to succeed before restarting transports | 
org.apache.activemq.util.DefaultIOExceptionHandler | restart transport 
connectors post IO exception
2013-02-14 11:16:16,062 | WARN  | Transport Connection to: 
tcp://0:0:0:0:0:0:0:1:54836 failed: java.io.EOFException | 
org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: 
tcp:///0:0:0:0:0:0:0:1:54836@61616
2013-02-14 11:16:16,268 | ERROR | KahaDB failed to store to Journal | 
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Transport: 
tcp:///0:0:0:0:0:0:0:1:54848@61616
java.io.IOException: The volume for a file has been externally altered so that 
the opened file is no longer valid
                at java.io.RandomAccessFile.readBytes(Native Method)
                at java.io.RandomAccessFile.read(RandomAccessFile.java:338)
                at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397)
                at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377)
                at org.apache.kahadb.page.PageFile.readPage(PageFile.java:876)

{noformat}

Then we got a loop trying to get a checkpoint before restarting transports:

{noformat}
2013-02-14 11:16:21,046 | INFO  | KahaDB: Recovering checkpoint thread after 
death | org.apache.activemq.store.kahadb.MessageDatabase | restart transport 
connectors post IO exception
2013-02-14 11:16:21,049 | INFO  | waiting for broker persistence adapter 
checkpoint to succeed before restarting transports | 
org.apache.activemq.util.DefaultIOExceptionHandler | restart transport 
connectors post IO exception
2013-02-14 11:16:26,055 | ERROR | Checkpoint failed | 
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint 
Worker
java.io.IOException: The volume for a file has been externally altered so that 
the opened file is no longer valid
                at java.io.RandomAccessFile.write(Native Method)
                at 
java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)
                at 
org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:1077)
                at org.apache.kahadb.page.PageFile.flush(PageFile.java:515)
                at 
org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1313)
                at 
org.apache.activemq.store.kahadb.MessageDatabase$10.execute(MessageDatabase.java:769)
                at 
org.apache.kahadb.page.Transaction.execute(Transaction.java:769)
                at 
org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:767)
                at 
org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:323)
2013-02-14 11:16:26,064 | INFO  | waiting for broker persistence adapter 
checkpoint to succeed before restarting transports | 
org.apache.activemq.util.DefaultIOExceptionHandler | restart transport 
connectors post IO exception
2013-02-14 11:16:31,069 | INFO  | KahaDB: Recovering checkpoint thread after 
death | org.apache.activemq.store.kahadb.MessageDatabase | restart transport 
connectors post IO exception

{noformat}

So it would not automatically come back... I then restarted AMQ and got the 
following:

{noformat}
2013-02-14 11:33:46,398 | INFO  | Using Persistence Adapter: 
KahaDBPersistenceAdapter[E:\kahadb] | org.apache.activemq.broker.BrokerService 
| main
2013-02-14 11:33:47,892 | INFO  | Corrupt journal records found in 
'E:\kahadb\db-2.log' between offsets: 5576149..6343375 | 
org.apache.kahadb.journal.Journal | main
2013-02-14 11:33:47,905 | INFO  | Corrupt journal records found in 
'E:\kahadb\db-2.log' between offsets: 6351482..6356885 | 
org.apache.kahadb.journal.Journal | main
2013-02-14 11:33:56,253 | INFO  | KahaDB is version 4 | 
org.apache.activemq.store.kahadb.MessageDatabase | main
2013-02-14 11:33:56,373 | INFO  | Recovering from the journal ... | 
org.apache.activemq.store.kahadb.MessageDatabase | main
2013-02-14 11:33:56,643 | INFO  | Recovery replayed 478 operations from the 
journal in 0.292 seconds. | org.apache.activemq.store.kahadb.MessageDatabase | 
main
2013-02-14 11:33:56,702 | INFO  | Scheduler using directory: 
E:\habroker\scheduler | org.apache.activemq.broker.scheduler.SchedulerBroker | 
main
2013-02-14 11:33:56,726 | INFO  | Apache ActiveMQ 5.7.0 (habroker, 
ID:kpc-eng-d27-55891-1360870436389-0:1) is starting | 
org.apache.activemq.broker.BrokerService | main
2013-02-14 11:33:56,748 | INFO  | ignoring zero length, partially initialised 
journal data file: db-1.log number = 1 , length = 0 | 
org.apache.kahadb.journal.Journal | main
2013-02-14 11:33:57,343 | INFO  | JobSchedulerStore:E:\habroker\scheduler 
started | org.apache.activemq.broker.scheduler.JobSchedulerStore | main
{noformat}

This is a bit different than what we saw in our QA environment where it said it 
lost 33 messages.

I looked in the history of the file and in 5.5.1, the stopStartConnectors is 
still false. Not really sure what kicked off the retry in 5.5.1 vs. the broker 
stop in 5.7.0.
                
> Kahadb error during SAN failover delayed write - Allow kahaDB to recover in a 
> similar manner as the JDBC store using the IOExceptionHandler
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-3725
>                 URL: https://issues.apache.org/jira/browse/AMQ-3725
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Message Store
>    Affects Versions: 5.5.1
>            Reporter: Jason Sherman
>             Fix For: 5.9.0
>
>
> An issue can arise that causes the broker to terminate when using kahaDB with 
> a SAN, when the SAN fails over.  In this case the failover process is 
> seamless however, on fail back there is a 2-3 sec delay where writes are 
> blocked and the broker terminates.  With the JDBC datastore a similar 
> situation can be handled by using the IOExceptionHandler.  However with 
> kahaDB, when this same IOExceptionHandler is added it prevents the broker 
> from terminating but kahaDB retains an invalid index.
> {code}
>  INFO | ActiveMQ JMS Message Broker (Broker1, 
> ID:macbookpro-251a.home-56915-1328715089252-0:1) started
>  INFO | jetty-7.1.6.v20100715
>  INFO | ActiveMQ WebConsole initialized.
>  INFO | Initializing Spring FrameworkServlet 'dispatcher'
>  INFO | ActiveMQ Console at http://0.0.0.0:8161/admin
>  INFO | ActiveMQ Web Demos at http://0.0.0.0:8161/demo
>  INFO | RESTful file access application at http://0.0.0.0:8161/fileserver
>  INFO | FUSE Web Console at http://0.0.0.0:8161/console
>  INFO | Started [email protected]:8161
> ERROR | KahaDB failed to store to Journal
> java.io.SyncFailedException: sync failed
>       at java.io.FileDescriptor.sync(Native Method)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:382)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  INFO | Ignoring IO exception, java.io.SyncFailedException: sync failed
> java.io.SyncFailedException: sync failed
>       at java.io.FileDescriptor.sync(Native Method)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:382)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
> ERROR | Checkpoint failed
> java.io.SyncFailedException: sync failed
>       at java.io.FileDescriptor.sync(Native Method)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:382)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  INFO | Ignoring IO exception, java.io.SyncFailedException: sync failed
> java.io.SyncFailedException: sync failed
>       at java.io.FileDescriptor.sync(Native Method)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:382)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
> ERROR | KahaDB failed to store to Journal
> java.io.FileNotFoundException: /Volumes/NAS-01/data/kahadb/db-1.log (No such 
> file or directory)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
>       at 
> org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.java:70)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:324)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  INFO | Ignoring IO exception, java.io.FileNotFoundException: 
> /Volumes/NAS-01/data/kahadb/db-1.log (No such file or directory)
> java.io.FileNotFoundException: /Volumes/NAS-01/data/kahadb/db-1.log (No such 
> file or directory)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
>       at 
> org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.java:70)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:324)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
> ERROR | KahaDB failed to store to Journal
> java.io.FileNotFoundException: /Volumes/NAS-01/data/kahadb/db-1.log (No such 
> file or directory)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
>       at 
> org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.java:70)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:324)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  INFO | Ignoring IO exception, java.io.FileNotFoundException: 
> /Volumes/NAS-01/data/kahadb/db-1.log (No such file or directory)
> java.io.FileNotFoundException: /Volumes/NAS-01/data/kahadb/db-1.log (No such 
> file or directory)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
>       at 
> org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.java:70)
>       at 
> org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileAppender.java:324)
>       at 
> org.apache.kahadb.journal.DataFileAppender$2.run(DataFileAppender.java:203)
>  WARN | Transport failed: java.io.EOFException
>  WARN | Transport failed: java.io.EOFException
>  INFO | KahaDB: Recovering checkpoint thread after death
> ERROR | Checkpoint failed
> java.io.IOException: Input/output error
>       at java.io.RandomAccessFile.write(Native Method)
>       at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:1006)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:484)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1290)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$10.execute(MessageDatabase.java:768)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:760)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:766)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:315)
>  INFO | Ignoring IO exception, java.io.IOException: Input/output error
> java.io.IOException: Input/output error
>       at java.io.RandomAccessFile.write(Native Method)
>       at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:1006)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:484)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1290)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$10.execute(MessageDatabase.java:768)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:760)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:766)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:315)
>  INFO | KahaDB: Recovering checkpoint thread after death
> ERROR | Checkpoint failed
> java.io.IOException: Input/output error
>       at java.io.RandomAccessFile.write(Native Method)
>       at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:1006)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:484)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1290)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$10.execute(MessageDatabase.java:768)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:760)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:766)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:315)
>  INFO | Ignoring IO exception, java.io.IOException: Input/output error
> java.io.IOException: Input/output error
>       at java.io.RandomAccessFile.write(Native Method)
>       at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:1006)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:484)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1290)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$10.execute(MessageDatabase.java:768)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:760)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:766)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:315)
>  WARN | Transport failed: java.io.EOFException
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to