[ https://issues.apache.org/jira/browse/QPID-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Rudyy updated QPID-7192: ----------------------------- Summary: [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE environment has locally committed transactions requiring rollback (was: [Java Broker] BDB HA Virtual Host Node does not restart successfully if JE environment has locally committed transactions requiring rollback) > [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE > environment has locally committed transactions requiring rollback > ----------------------------------------------------------------------------------------------------------------------------------------- > > Key: QPID-7192 > URL: https://issues.apache.org/jira/browse/QPID-7192 > Project: Qpid > Issue Type: Bug > Components: Broker-J > Affects Versions: qpid-java-6.0, qpid-java-6.1, qpid-java-broker-7.1.0, > qpid-java-broker-7.0.6 > Reporter: Alex Rudyy > Priority: Major > Fix For: qpid-java-broker-7.0.7, qpid-java-broker-7.1.1 > > Attachments: QPID-7192-wip.diff > > > Start-up of BDB HA VHN fails in the following scenario: > 1) When Transaction on Master node is in process of commit and the replicas > BDB HA VHNs are stopped at the same time, the transaction is aborted with > InsufficientReplicasException > 2) BDB HA VHN for replicas are restarted and the Replica environment detects > a transaction requiring rollback from previous commit. The rollback > transaction causes restart of JE Environment. > 3) BDB HA VHN for impacted replica continues activation and performs an > intruder protection checks but Environment is restarting which results in > exception "ConnectionScopedRuntimeException: Environment is restarting". The > exception puts BDB HA VHN into ERRORED state. JE environment re-join the > group successfully but BDB HA VHN is not aware about it, as > StateChangeListener is not set and BDB HA VHN does not receive notifications > about state transitions > A manual operator intervention is required to recover from the issue: BDB HA > VHN needs to be started up again by invoking state change operation from Web > Management Console or REST API. > > Here are JE Exceptions reported on Replica side on transaction rollback: > {noformat} > BROKER-20 WARN [DETACHED > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] > o.a.q.s.s.b.r.ReplicatedEnvironmentFacade > test:nodetestInFlightTransactionsWhilstMajorityIsLost10004 has transaction(s) > ahead of the current master. These must be discarded to allow this node to > rejoin the group. This condition is normally caused by the use of weak > durability options. > BROKER-20 DEBUG [DETACHED > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] > o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Environment restarting due to > exception (JE 5.0.104) > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > Node > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > must rollback 3 commits to the earliest point indicated by transaction > id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to > rejoin the replication group. All existing ReplicatedEnvironment handles must > be closed and reinstantiated. Log files were truncated to file 0x0, offset > 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or > abort. Must run recovery by re-opening Environment handles Environment is > invalid and must be closed. > com.sleepycat.je.rep.RollbackException: (JE 5.0.104) > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > Node > nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config > must rollback 3 commits to the earliest point indicated by transaction > id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to > rejoin the replication group. All existing ReplicatedEnvironment handles must > be closed and reinstantiated. Log files were truncated to file 0x0, offset > 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or > abort. Must run recovery by re-opening Environment handles Environment is > invalid and must be closed. > at > com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupHardRecovery(ReplicaFeederSyncup.java:650) > ~[je-5.0.104.jar:5.0.104] > at > com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:341) > ~[je-5.0.104.jar:5.0.104] > at > com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:148) > ~[je-5.0.104.jar:5.0.104] > at > com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:605) > ~[je-5.0.104.jar:5.0.104] > at > com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:396) > ~[je-5.0.104.jar:5.0.104] > at > com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:332) > ~[je-5.0.104.jar:5.0.104] > at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1506) > ~[je-5.0.104.jar:5.0.104] > {noformat} > The above causes the below > {noformat} > BROKER-20 DEBUG [HttpManagement-http-74] > o.a.q.s.m.p.f.ExceptionHandlingFilter Exception in servlet > '/api/latest/virtualhostnode/nodetestInFlightTransactionsWhilstMajorityIsLost10004': > org.apache.qpid.server.util.ConnectionScopedRuntimeException: Environment is > restarting > at > org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.getEnvironment(ReplicatedEnvironmentFacade.java:1342) > ~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.getNodes(ReplicatedEnvironmentFacade.java:965) > ~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.virtualhostnode.berkeleydb.BDBHAVirtualHostNodeImpl.activate(BDBHAVirtualHostNodeImpl.java:361) > ~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.virtualhostnode.AbstractVirtualHostNode.doActivate(AbstractVirtualHostNode.java:160) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.7.0_80] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > ~[na:1.7.0_80] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.7.0_80] > at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80] > at > org.apache.qpid.server.model.AbstractConfiguredObject.attainState(AbstractConfiguredObject.java:1308) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.model.AbstractConfiguredObject.attainState(AbstractConfiguredObject.java:1287) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.model.AbstractConfiguredObject.attainStateIfOpenedOrReopenFailed(AbstractConfiguredObject.java:1271) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.model.AbstractConfiguredObject.access$1700(AbstractConfiguredObject.java:80) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.model.AbstractConfiguredObject$15.execute(AbstractConfiguredObject.java:1519) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.model.AbstractConfiguredObject$15.execute(AbstractConfiguredObject.java:1455) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.model.AbstractConfiguredObject$2.execute(AbstractConfiguredObject.java:561) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.model.AbstractConfiguredObject$2.execute(AbstractConfiguredObject.java:554) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.configuration.updater.TaskExecutorImpl$TaskLoggingWrapper.execute(TaskExecutorImpl.java:270) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at > org.apache.qpid.server.configuration.updater.TaskExecutorImpl$CallableWrapper$1.run(TaskExecutorImpl.java:342) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > ~[na:1.7.0_80] > at javax.security.auth.Subject.doAs(Subject.java:356) ~[na:1.7.0_80] > at > org.apache.qpid.server.configuration.updater.TaskExecutorImpl$CallableWrapper.call(TaskExecutorImpl.java:335) > ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {noformat} > Test MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost sporadically > fails because of this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org