[
https://issues.apache.org/jira/browse/AMQ-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419046#comment-15419046
]
Gary Tully commented on AMQ-5815:
---------------------------------
https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=330f0ce deals with
the enableIndexWriteAsync=true case.
For the original description from Andrew, I can't yet see where the write lock
is not getting released.
> Frequency IOException cast dead lock while using kahadb in high concurrency
> environment
> ---------------------------------------------------------------------------------------
>
> Key: AMQ-5815
> URL: https://issues.apache.org/jira/browse/AMQ-5815
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.10.0
> Environment: Master/Slave AMQ + kahadb(NAS using NFS)
> Reporter: Andrew
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> When the concurrency is in high level, iops more than 3000/s, we find that
> most of the amq connections are blocked.We can't either get or push any data
> through them unless restart.This issue appears *frequency*.
> There is many IOExceptions in the amq logs.
> {code}
> 2015-05-28 00:37:08,544 | ERROR | Checkpoint failed |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> java.io.IOException: Input/output error
> at java.io.RandomAccessFile.write(Native Method)[:1.6.0_30]
> at
> java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)[:1.6.0_30]
> at
> org.apache.activemq.util.RecoverableRandomAccessFile.writeLong(RecoverableRandomAccessFile.java:305)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.page.PageFile.writeBatch(PageFile.java:1062)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.page.PageFile.flush(PageFile.java:516)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1512)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase$17.execute(MessageDatabase.java:1484)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1481)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:929)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:357)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> 2015-05-28 00:37:08,547 | INFO | Ignoring IO exception, java.io.IOException:
> Input/output error | org.apache.activemq.util.DefaultIOExceptionHandler |
> ActiveMQ Journal Checkpoint Worker
> java.io.IOException: Input/output error
> at java.io.RandomAccessFile.write(Native Method)[:1.6.0_30]
> at
> java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)[:1.6.0_30]
> at
> org.apache.activemq.util.RecoverableRandomAccessFile.writeLong(RecoverableRandomAccessFile.java:305)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.page.PageFile.writeBatch(PageFile.java:1062)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.page.PageFile.flush(PageFile.java:516)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1512)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase$17.execute(MessageDatabase.java:1484)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1481)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:929)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:357)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> 2015-05-28 00:37:08,580 | INFO | KahaDB: Recovering checkpoint thread after
> death | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ
> Transport: tcp:///10.40.50.31:58171@61616
> 2015-05-28 00:37:17,285 | INFO | Journal failed while writing at: 2359276 |
> org.apache.activemq.store.kahadb.disk.journal.DataFileAppender | ActiveMQ
> Data File Writer
> 2015-05-28 00:37:17,287 | ERROR | KahaDB failed to store to Journal |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Transport:
> tcp:///10.40.50.17:4487@61616
> java.io.IOException: Input/output error
> at sun.nio.ch.FileChannelImpl.force0(Native Method)[:1.6.0_30]
> at
> sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)[:1.6.0_30]
> at
> org.apache.activemq.util.RecoverableRandomAccessFile.sync(RecoverableRandomAccessFile.java:402)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.processQueue(DataFileAppender.java:373)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.journal.DataFileAppender$1.run(DataFileAppender.java:193)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> 2015-05-28 00:37:17,291 | ERROR | KahaDB failed to store to Journal |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Transport:
> tcp:///10.40.50.17:4442@61616
> java.io.IOException: Input/output error
> at sun.nio.ch.FileChannelImpl.force0(Native Method)[:1.6.0_30]
> at
> sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)[:1.6.0_30]
> at
> org.apache.activemq.util.RecoverableRandomAccessFile.sync(RecoverableRandomAccessFile.java:402)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.processQueue(DataFileAppender.java:373)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.journal.DataFileAppender$1.run(DataFileAppender.java:193)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> 2015-05-28 00:37:17,293 | INFO | Ignoring IO exception, java.io.IOException:
> Input/output error | org.apache.activemq.util.DefaultIOExceptionHandler |
> ActiveMQ Transport: tcp:///10.40.50.17:4442@61616
> java.io.IOException: Input/output error
> at sun.nio.ch.FileChannelImpl.force0(Native Method)[:1.6.0_30]
> at
> sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)[:1.6.0_30]
> at
> org.apache.activemq.util.RecoverableRandomAccessFile.sync(RecoverableRandomAccessFile.java:402)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.processQueue(DataFileAppender.java:373)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> at
> org.apache.activemq.store.kahadb.disk.journal.DataFileAppender$1.run(DataFileAppender.java:193)[activemq-kahadb-store-5.10.0.jar:5.10.0]
> {code}
> With the aid of jstack ,we find dead lock(How can I upload the whole jtack
> log?)
> The blocked connections are all waiting for 0x00007f90500034e0
> {code}
> [01:08:01] "ActiveMQ Transport: tcp:///10.40.50.17:4479@61616" daemon prio=10
> tid=0x00007f8f8c015000 nid=0x2ece waiting for monitor entry
> [0x00007f8f1f6e4000]
> [01:08:01] java.lang.Thread.State: BLOCKED (on object monitor)
> [01:08:01] at
> org.apache.activemq.transaction.LocalTransaction.rollback(LocalTransaction.java:94)
> [01:08:01] - waiting to lock <0x00007f90500034e0> (a
> org.apache.activemq.store.kahadb.KahaDBTransactionStore)
> [01:08:01] at
> org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:74)
> [01:08:01] at
> org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)
> [01:08:01] at
> org.apache.activemq.broker.MutableBrokerFilter.commitTransaction(MutableBrokerFilter.java:112)
> [01:08:01] at
> org.apache.activemq.broker.TransportConnection.processCommitTransactionOnePhase(TransportConnection.java:453)
> [01:08:01] at
> org.apache.activemq.command.TransactionInfo.visit(TransactionInfo.java:100)
> [01:08:01] at
> org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:294)
> [01:08:01] at
> org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:148)
> [01:08:01] at
> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
> [01:08:01] at
> org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
> [01:08:01] at
> org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:270)
> [01:08:01] at
> org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
> [01:08:01] at
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
> [01:08:01] at
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
> [01:08:01] at java.lang.Thread.run(Thread.java:662)
> {code}
> Only one connection(Thread) get this lock.But it is waiting for another Read
> lock 0x00007f90530ed8d8
> {code}
> [01:08:01] "ActiveMQ Transport: tcp:///10.40.50.17:4484@61616" daemon prio=10
> tid=0x00007f8fa8039800 nid=0x2ed2 waiting on condition [0x00007f8f1f2e0000]
> [01:08:01] java.lang.Thread.State: WAITING (parking)
> [01:08:01] at sun.misc.Unsafe.park(Native Method)
> [01:08:01] - parking to wait for <0x00007f90530ed8d8> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> [01:08:01] at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
> [01:08:01] at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> [01:08:01] at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
> [01:08:01] at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
> [01:08:01] at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
> [01:08:01] at
> org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:972)
> [01:08:01] at
> org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:958)
> [01:08:01] at
> org.apache.activemq.store.kahadb.KahaDBTransactionStore.rollback(KahaDBTransactionStore.java:312)
> [01:08:01] at
> org.apache.activemq.transaction.LocalTransaction.rollback(LocalTransaction.java:94)
> [01:08:01] - locked <0x00007f90500034e0> (a
> org.apache.activemq.store.kahadb.KahaDBTransactionStore)
> [01:08:01] at
> org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:74)
> [01:08:01] at
> org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)
> [01:08:01] at
> org.apache.activemq.broker.MutableBrokerFilter.commitTransaction(MutableBrokerFilter.java:112)
> [01:08:01] at
> org.apache.activemq.broker.TransportConnection.processCommitTransactionOnePhase(TransportConnection.java:453)
> [01:08:01] at
> org.apache.activemq.command.TransactionInfo.visit(TransactionInfo.java:100)
> [01:08:01] at
> org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:294)
> [01:08:01] at
> org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:148)
> [01:08:01] at
> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
> [01:08:01] at
> org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
> [01:08:01] at
> org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:270)
> [01:08:01] at
> org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
> [01:08:01] at
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
> [01:08:01] at
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
> [01:08:01] at java.lang.Thread.run(Thread.java:662)
> {code}
> At last we find 0x00007f90530ed8d8 is in many write lock waiting.But none
> write lock hodes it. The dead lock occurs.
> {code}
> [01:07:56] "ActiveMQ Transport: tcp:///10.40.50.47:60701@61616" daemon
> prio=10 tid=0x00007f8f7cbc5000 nid=0x3ec8 waiting on condition
> [0x00007f900d864000]
> [01:07:56] java.lang.Thread.State: WAITING (parking)
> [01:07:56] at sun.misc.Unsafe.park(Native Method)
> [01:07:56] - parking to wait for <0x00007f90530ed8d8> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> [01:07:56] at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
> [01:07:56] at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> [01:07:56] at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
> [01:07:56] at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
> [01:07:56] at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
> [01:07:56] at
> org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:972)
> [01:07:56] at
> org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:958)
> [01:07:56] at
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.removeMessage(KahaDBStore.java:453)
> [01:07:56] at
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.removeAsyncMessage(KahaDBStore.java:407)
> [01:07:56] at
> org.apache.activemq.store.kahadb.KahaDBTransactionStore.removeAsyncMessage(KahaDBTransactionStore.java:498)
> [01:07:56] at
> org.apache.activemq.store.kahadb.KahaDBTransactionStore$1.removeAsyncMessage(KahaDBTransactionStore.java:185)
> [01:07:56] at
> org.apache.activemq.broker.region.Queue.acknowledge(Queue.java:998)
> [01:07:56] at
> org.apache.activemq.broker.region.Queue.removeMessage(Queue.java:1781)
> [01:07:56] at
> org.apache.activemq.broker.region.QueueSubscription.acknowledge(QueueSubscription.java:63)
> [01:07:56] at
> org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(PrefetchSubscription.java:235)
> [01:07:56] - locked <0x00007f905d10b1e8> (a java.lang.Object)
> [01:07:56] at
> org.apache.activemq.broker.region.AbstractRegion.acknowledge(AbstractRegion.java:446)
> [01:07:56] at
> org.apache.activemq.broker.region.RegionBroker.acknowledge(RegionBroker.java:461)
> [01:07:56] at
> org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:82)
> [01:07:56] at
> org.apache.activemq.broker.TransactionBroker.acknowledge(TransactionBroker.java:277)
> [01:07:56] at
> org.apache.activemq.broker.MutableBrokerFilter.acknowledge(MutableBrokerFilter.java:92)
> [01:07:56] at
> org.apache.activemq.broker.TransportConnection.processMessageAck(TransportConnection.java:505)
> [01:07:56] at
> org.apache.activemq.command.MessageAck.visit(MessageAck.java:245)
> [01:07:56] at
> org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:294)
> [01:07:56] at
> org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:148)
> [01:07:56] at
> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
> [01:07:56] at
> org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
> [01:07:56] at
> org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:270)
> [01:07:56] at
> org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
> [01:07:56] at
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
> [01:07:56] at
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
> [01:07:56] at java.lang.Thread.run(Thread.java:662)
> {code}
> In my opinion, IOExceptions may lead to losing persisten messages and so on
> ,but the broker shouldn't stop service.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)