[
https://issues.apache.org/jira/browse/QPID-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keith Wall resolved QPID-6997.
------------------------------
Resolution: Fixed
> [Java Broker, BDBStore] HA JE Transaction commit might end up in
> NullPointerException when commit is invoked when majority is lost
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: QPID-6997
> URL: https://issues.apache.org/jira/browse/QPID-6997
> Project: Qpid
> Issue Type: Bug
> Components: Java Broker
> Affects Versions: qpid-java-6.0
> Reporter: Alex Rudyy
> Fix For: qpid-java-6.0.1, qpid-java-6.1
>
> Attachments:
> TEST-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testTransferMasterWhilstTransactionsInFlight.txt.NPE
>
>
> On running some tests for BDB HA with default settings
> (ReplicaAckPolicy=SIMPLE_MAJORITY) involving publishing messages in parallel
> from multiple connections and disconnecting replica nodes in the middle of
> publishing transactions, in some rare circumstances the publishing
> transaction JE commit failed with NPE thrown from JE code as below:
> {noformat}
> ########################################################################
> #
> # Unhandled Exception java.lang.NullPointerException in Thread
> IO-/127.0.0.1:35408
> #
> # Exiting
> #
> ########################################################################
> java.lang.NullPointerException
> at
> com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:130)
> at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1130)
> at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:191)
> at com.sleepycat.je.txn.Txn.commit(Txn.java:756)
> at com.sleepycat.je.Transaction.doCommit(Transaction.java:609)
> at com.sleepycat.je.Transaction.commit(Transaction.java:410)
> at
> org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commitAsync(ReplicatedEnvironmentFacade.java:302)
> at
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.commitTranAsyncImpl(AbstractBDBMessageStore.java:794)
> at
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$1200(AbstractBDBMessageStore.java:74)
> at
> org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1364)
> at
> org.apache.qpid.server.txn.LocalTransaction.commitAsync(LocalTransaction.java:399)
> at
> org.apache.qpid.server.protocol.v0_8.AMQChannel.commit(AMQChannel.java:1220)
> at
> org.apache.qpid.server.protocol.v0_8.AMQChannel.receiveTxCommit(AMQChannel.java:3622)
> at org.apache.qpid.codec.ServerDecoder.processMethod(ServerDecoder.java:228)
> at org.apache.qpid.codec.AMQDecoder.processFrame(AMQDecoder.java:191)
> at
> org.apache.qpid.server.protocol.v0_8.BrokerDecoder.doProcessFrame(BrokerDecoder.java:114)
> at
> org.apache.qpid.server.protocol.v0_8.BrokerDecoder.access$000(BrokerDecoder.java:36)
> at
> org.apache.qpid.server.protocol.v0_8.BrokerDecoder$1.run(BrokerDecoder.java:78)
> at
> org.apache.qpid.server.protocol.v0_8.BrokerDecoder$1.run(BrokerDecoder.java:74)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> org.apache.qpid.server.protocol.v0_8.BrokerDecoder.processFrame(BrokerDecoder.java:73)
> at org.apache.qpid.codec.AMQDecoder.processInput(AMQDecoder.java:173)
> at org.apache.qpid.codec.AMQDecoder.decode(AMQDecoder.java:114)
> at org.apache.qpid.codec.ServerDecoder.decodeBuffer(ServerDecoder.java:43)
> at
> org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8$1.run(AMQPConnection_0_8.java:266)
> at
> org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8$1.run(AMQPConnection_0_8.java:258)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8.received(AMQPConnection_0_8.java:257)
> at
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:142)
> at
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:547)
> at
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
> at
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:446)
> at
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:253)
> at
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:108)
> at
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:499)
> at
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:337)
> at
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:86)
> at
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:457)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It seems that majority was lost in the middle of transaction. Qpid HA logic
> detected that and restarted the environment, however, the transaction was not
> aborted.
> On closing of the environment the following has been reported:
> {noformat}
> Problem: 3 locks left
> ---- LSN: 0x0/0x2d80----
> ThinLockAddr:1794979774 Owner:1919034342
> -50_Selector-virtualhost-test-iopool_MasterTxn Waiters: (none)
> ---- LSN: 0x0/0x1ed8----
> ThinLockAddr:374788562 Owner:1919034342
> -50_Selector-virtualhost-test-iopool_MasterTxn Waiters: (none)
> ---- LSN: 0x0/0x20db----
> ThinLockAddr:501824883 Owner:1919034342
> -50_Selector-virtualhost-test-iopool_MasterTxn Waiters: (none)
> [BROKER-1] o.a.q.t.u.SpawnedBrokerHolder
> Problem: 1 txns left
> txnId = -50 txnName = 1919034342
> -50_Selector-virtualhost-test-iopool_MasterTxn
> Local Cache Usage = 240
> Cache Layout: Allocation of resources in the cache.
> adminBytes=0
> cacheTotalBytes=1,622,051
> dataAdminBytes=0
> dataBytes=0
> lockBytes=240
> sharedCacheTotalBytes=0
> WARN
> [Environment-test:nodetestTransferMasterWhilstTransactionsInFlight10001]
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Ignoring an exception whilst
> closing environment
> com.sleepycat.je.EnvironmentFailureException: (JE 5.0.104)
> nodetestTransferMasterWhilstTransactionsInFlight10001(1):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testTransferMasterWhilstTransactionsInFlight-1-8967653656689695718/test/config
> Problem closing handle
> nodetestTransferMasterWhilstTransactionsInFlight10001(1)
> UNEXPECTED_EXCEPTION: Unexpected internal Exception, may have side effects.
> Environment is invalid and must be closed.
> at
> com.sleepycat.je.rep.ReplicatedEnvironment.close(ReplicatedEnvironment.java:709)
> ~[je-5.0.104.jar:5.0.104]
> at
> org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.closeEnvironmentOnRestart(ReplicatedEnvironmentFacade.java:1277)
> [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.restartEnvironment(ReplicatedEnvironmentFacade.java:1253)
> [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.access$500(ReplicatedEnvironmentFacade.java:92)
> [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$3.run(ReplicatedEnvironmentFacade.java:481)
> [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_80]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_80]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
> Caused by: java.lang.IllegalStateException: Unclosed Database: MESSAGE_CONTENT
> Unclosed Database: MESSAGE_CONTENT
> Databases left open: 2
> There is 1 existing transaction opened against the Environment.
> Aborting open transactions ...
> aborting <Transaction id="-46">
> at com.sleepycat.je.Environment.close(Environment.java:390)
> ~[je-5.0.104.jar:5.0.104]
> at
> com.sleepycat.je.rep.ReplicatedEnvironment.close(ReplicatedEnvironment.java:702)
> ~[je-5.0.104.jar:5.0.104]
> ... 7 common frames omitted
> {noformat}
> It appears we need to abort all transactions whilst restarting the environment
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]