[
https://issues.apache.org/jira/browse/QPID-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Rudyy updated QPID-7192:
-----------------------------
Affects Version/s: qpid-java-broker-7.1.0
qpid-java-broker-7.0.6
Fix Version/s: qpid-java-broker-7.1.1
qpid-java-broker-7.0.7
> [Java Broker] BDB HA Virtual Host Node does not restart successfully if JE
> environment has locally committed transactions requiring rollback
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: QPID-7192
> URL: https://issues.apache.org/jira/browse/QPID-7192
> Project: Qpid
> Issue Type: Bug
> Components: Broker-J
> Affects Versions: qpid-java-6.0, qpid-java-6.1, qpid-java-broker-7.1.0,
> qpid-java-broker-7.0.6
> Reporter: Alex Rudyy
> Priority: Major
> Fix For: qpid-java-broker-7.0.7, qpid-java-broker-7.1.1
>
> Attachments: QPID-7192-wip.diff
>
>
> Start-up of BDB HA VHN fails in the following scenario:
> 1) When Transaction on Master node is in process of commit and the replicas
> BDB HA VHNs are stopped at the same time, the transaction is aborted with
> InsufficientReplicasException
> 2) BDB HA VHN for replicas are restarted and the Replica environment detects
> a transaction requiring rollback from previous commit. The rollback
> transaction causes restart of JE Environment.
> 3) BDB HA VHN for impacted replica continues activation and performs an
> intruder protection checks but Environment is restarting which results in
> exception "ConnectionScopedRuntimeException: Environment is restarting". The
> exception puts BDB HA VHN into ERRORED state. JE environment re-join the
> group successfully but BDB HA VHN is not aware about it, as
> StateChangeListener is not set and BDB HA VHN does not receive notifications
> about state transitions
> A manual operator intervention is required to recover from the issue: BDB HA
> VHN needs to be started up again by invoking state change operation from Web
> Management Console or REST API.
>
> Here are JE Exceptions reported on Replica side on transaction rollback:
> {noformat}
> BROKER-20 WARN [DETACHED
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)]
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade
> test:nodetestInFlightTransactionsWhilstMajorityIsLost10004 has transaction(s)
> ahead of the current master. These must be discarded to allow this node to
> rejoin the group. This condition is normally caused by the use of weak
> durability options.
> BROKER-20 DEBUG [DETACHED
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)]
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Environment restarting due to
> exception (JE 5.0.104)
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
> Node
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
> must rollback 3 commits to the earliest point indicated by transaction
> id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to
> rejoin the replication group. All existing ReplicatedEnvironment handles must
> be closed and reinstantiated. Log files were truncated to file 0x0, offset
> 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or
> abort. Must run recovery by re-opening Environment handles Environment is
> invalid and must be closed.
> com.sleepycat.je.rep.RollbackException: (JE 5.0.104)
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
> Node
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
> must rollback 3 commits to the earliest point indicated by transaction
> id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to
> rejoin the replication group. All existing ReplicatedEnvironment handles must
> be closed and reinstantiated. Log files were truncated to file 0x0, offset
> 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or
> abort. Must run recovery by re-opening Environment handles Environment is
> invalid and must be closed.
> at
> com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupHardRecovery(ReplicaFeederSyncup.java:650)
> ~[je-5.0.104.jar:5.0.104]
> at
> com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:341)
> ~[je-5.0.104.jar:5.0.104]
> at
> com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:148)
> ~[je-5.0.104.jar:5.0.104]
> at
> com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:605)
> ~[je-5.0.104.jar:5.0.104]
> at
> com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:396)
> ~[je-5.0.104.jar:5.0.104]
> at
> com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:332)
> ~[je-5.0.104.jar:5.0.104]
> at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1506)
> ~[je-5.0.104.jar:5.0.104]
> {noformat}
> The above causes the below
> {noformat}
> BROKER-20 DEBUG [HttpManagement-http-74]
> o.a.q.s.m.p.f.ExceptionHandlingFilter Exception in servlet
> '/api/latest/virtualhostnode/nodetestInFlightTransactionsWhilstMajorityIsLost10004':
> org.apache.qpid.server.util.ConnectionScopedRuntimeException: Environment is
> restarting
> at
> org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.getEnvironment(ReplicatedEnvironmentFacade.java:1342)
> ~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.getNodes(ReplicatedEnvironmentFacade.java:965)
> ~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.virtualhostnode.berkeleydb.BDBHAVirtualHostNodeImpl.activate(BDBHAVirtualHostNodeImpl.java:361)
> ~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.virtualhostnode.AbstractVirtualHostNode.doActivate(AbstractVirtualHostNode.java:160)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[na:1.7.0_80]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> ~[na:1.7.0_80]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[na:1.7.0_80]
> at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80]
> at
> org.apache.qpid.server.model.AbstractConfiguredObject.attainState(AbstractConfiguredObject.java:1308)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.model.AbstractConfiguredObject.attainState(AbstractConfiguredObject.java:1287)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.model.AbstractConfiguredObject.attainStateIfOpenedOrReopenFailed(AbstractConfiguredObject.java:1271)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.model.AbstractConfiguredObject.access$1700(AbstractConfiguredObject.java:80)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.model.AbstractConfiguredObject$15.execute(AbstractConfiguredObject.java:1519)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.model.AbstractConfiguredObject$15.execute(AbstractConfiguredObject.java:1455)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.model.AbstractConfiguredObject$2.execute(AbstractConfiguredObject.java:561)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.model.AbstractConfiguredObject$2.execute(AbstractConfiguredObject.java:554)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.configuration.updater.TaskExecutorImpl$TaskLoggingWrapper.execute(TaskExecutorImpl.java:270)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at
> org.apache.qpid.server.configuration.updater.TaskExecutorImpl$CallableWrapper$1.run(TaskExecutorImpl.java:342)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[na:1.7.0_80]
> at javax.security.auth.Subject.doAs(Subject.java:356) ~[na:1.7.0_80]
> at
> org.apache.qpid.server.configuration.updater.TaskExecutorImpl$CallableWrapper.call(TaskExecutorImpl.java:335)
> ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> ~[na:1.7.0_80]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_80]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ~[na:1.7.0_80]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
> {noformat}
> Test MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost sporadically
> fails because of this issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]