[ 
https://issues.apache.org/activemq/browse/AMQ-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Snyder updated AMQ-2595:
------------------------------

    Fix Version/s: 5.5.0
                       (was: 5.4.1)

> NFS problems can cause broker to hang on shutdown requiring a hard kill 
> ------------------------------------------------------------------------
>
>                 Key: AMQ-2595
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2595
>             Project: ActiveMQ
>          Issue Type: New Feature
>          Components: Broker
>    Affects Versions: 5.3.0
>            Reporter: Bruce Snyder
>             Fix For: 5.5.0
>
>
> When the ActiveMQ data directory exists on a NFS mounted volume and that 
> volume experiences some type of failure, it can cause the broker to hang and 
> only a manual kill will shut it down. Below is a sequence of errors 
> demonstrating this:
> {code} 
> ERROR | Failed to fill batch
> java.io.IOException: I/O error
>       at java.io.RandomAccessFile.readBytes(Native Method)
>       at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>       at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
>       at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
>       at 
> org.apache.kahadb.journal.DataFileAccessor.readRecord(DataFileAccessor.java:90)
>       at org.apache.kahadb.journal.Journal.read(Journal.java:573)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:656)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:534)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:229)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:728)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:222)
>       at 
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:81)
>       at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:227)
>       at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:99)
>       at 
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
>       at org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1363)
>       at 
> org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1503)
>       at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1178)
>       at 
> org.apache.activemq.thread.DeterministicTaskRunner.runTask(DeterministicTaskRunner.java:84)
>       at 
> org.apache.activemq.thread.DeterministicTaskRunner$1.run(DeterministicTaskRunner.java:41)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> ERROR | Failed to fill batch
> java.lang.RuntimeException: java.io.IOException: I/O error
>       at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:230)
>       at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:99)
>       at 
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
>       at org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1363)
>       at 
> org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1503)
>       at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1178)
>       at 
> org.apache.activemq.thread.DeterministicTaskRunner.runTask(DeterministicTaskRunner.java:84)
>       at 
> org.apache.activemq.thread.DeterministicTaskRunner$1.run(DeterministicTaskRunner.java:41)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: I/O error
>       at java.io.RandomAccessFile.readBytes(Native Method)
>       at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>       at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
>       at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
>       at 
> org.apache.kahadb.journal.DataFileAccessor.readRecord(DataFileAccessor.java:90)
>       at org.apache.kahadb.journal.Journal.read(Journal.java:573)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:656)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:534)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:229)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:728)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:222)
>       at 
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:81)
>       at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:227)
>       ... 10 more
> ERROR | Failed to page in more queue messages
> java.lang.RuntimeException: java.lang.RuntimeException:
> java.io.IOException: I/O error
>       at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:102)
>       at 
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
>       at org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1363)
>       at 
> org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1503)
>       at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1178)
>       at 
> org.apache.activemq.thread.DeterministicTaskRunner.runTask(DeterministicTaskRunner.java:84)
>       at 
> org.apache.activemq.thread.DeterministicTaskRunner$1.run(DeterministicTaskRunner.java:41)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.RuntimeException: java.io.IOException: I/O error
>       at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:230)
>       at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:99)
>       ... 9 more
> Caused by: java.io.IOException: I/O error
>       at java.io.RandomAccessFile.readBytes(Native Method)
>       at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>       at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
>       at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
>       at 
> org.apache.kahadb.journal.DataFileAccessor.readRecord(DataFileAccessor.java:90)
>       at org.apache.kahadb.journal.Journal.read(Journal.java:573)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:656)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:534)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:229)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:728)
>       at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:222)
>       at 
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:81)
>       at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:227)
>       ... 10 more
> java.io.IOException: I/O error
>       at java.io.RandomAccessFile.write(Native Method)
>       at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:967)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:974)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:443)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:947)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.access$200(MessageDatabase.java:79)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$9.execute(MessageDatabase.java:587)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:728)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:585)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:256)
> Exception in thread "ActiveMQ Journal Checkpoint Worker"
> java.lang.NullPointerException
>       at java.util.zip.Adler32.update(Adler32.java:44)
>       at org.apache.kahadb.page.PageFile.writeBatch(PageFile.java:962)
>       at org.apache.kahadb.page.PageFile.flush(PageFile.java:443)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:947)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.access$200(MessageDatabase.java:79)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$9.execute(MessageDatabase.java:587)
>       at org.apache.kahadb.page.Transaction.execute(Transaction.java:728)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:585)
>       at 
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:256)
> ^C INFO | ActiveMQ Message Broker (dynamic-broker1,
> ID:tssol10s-32974-1265131581698-0:0) is shutting down
> INFO | Connector vm://dynamic-broker1 Stopped
> INFO | dynamic-broker1 bridge to dynamic-broker2 stopped
> INFO | Network Connector localhost Stopped
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> INFO | The connection to '/127.0.0.1:32982' is taking a long time to shutdown.
> {code}
> In this case, a problem occurred with the NFS mount point where the data 
> directory lives which resulted in the I/O error. After sending a ctrl-c to 
> shut down the broker, it would not shut down on it's own so it had to be 
> killed. Only then could failover take place allowing another broker could 
> grab the lock.
> Given that the broker cannot recover from such a problem, it should probably 
> force itself to shut down so that failover can take place. Also, such a 
> failure should be a FATAL and not an ERROR in the logging. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to