[ 
https://issues.apache.org/jira/browse/QPID-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rudyy updated QPID-7192:
-----------------------------
    Summary: [Broker-J] BDB HA Virtual Host Node does not restart successfully 
if JE environment has locally committed transactions requiring rollback  (was: 
[Java Broker] BDB HA Virtual Host Node does not restart successfully if JE 
environment has locally committed transactions requiring rollback)

> [Broker-J] BDB HA Virtual Host Node does not restart successfully if JE 
> environment has locally committed transactions requiring rollback
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: QPID-7192
>                 URL: https://issues.apache.org/jira/browse/QPID-7192
>             Project: Qpid
>          Issue Type: Bug
>          Components: Broker-J
>    Affects Versions: qpid-java-6.0, qpid-java-6.1, qpid-java-broker-7.1.0, 
> qpid-java-broker-7.0.6
>            Reporter: Alex Rudyy
>            Priority: Major
>             Fix For: qpid-java-broker-7.0.7, qpid-java-broker-7.1.1
>
>         Attachments: QPID-7192-wip.diff
>
>
> Start-up of BDB HA VHN fails in the following scenario:
> 1) When Transaction on Master node is in process of commit  and the replicas 
> BDB HA VHNs are stopped at the same time, the transaction is aborted with 
> InsufficientReplicasException
> 2) BDB HA VHN for replicas are restarted and the Replica environment detects 
> a transaction requiring rollback from previous commit. The rollback 
> transaction causes restart of JE Environment.
> 3) BDB HA VHN for impacted replica continues activation and performs an 
> intruder protection checks but Environment is restarting which results in 
> exception "ConnectionScopedRuntimeException: Environment is restarting". The 
> exception puts BDB HA VHN into ERRORED state. JE environment re-join the 
> group successfully but  BDB HA VHN is not aware about it, as 
> StateChangeListener is not set and BDB HA VHN does not receive notifications 
> about state transitions
> A manual operator intervention is required to recover from the issue: BDB HA 
> VHN needs to be started up again by invoking state change operation from Web 
> Management Console or REST API.
>  
> Here are JE Exceptions reported on Replica side on transaction rollback:
> {noformat}
> BROKER-20 WARN  [DETACHED 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] 
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade 
> test:nodetestInFlightTransactionsWhilstMajorityIsLost10004 has transaction(s) 
> ahead of the current master. These must be discarded to allow this node to 
> rejoin the group. This condition is normally caused by the use of weak 
> durability options.
> BROKER-20 DEBUG [DETACHED 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] 
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Environment restarting due to 
> exception (JE 5.0.104) 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  Node 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  must rollback 3 commits to the earliest point indicated by transaction 
> id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to 
> rejoin the replication group. All existing ReplicatedEnvironment handles must 
> be closed and reinstantiated.  Log files were truncated to file 0x0, offset 
> 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or 
> abort. Must run recovery by re-opening Environment handles Environment is 
> invalid and must be closed.
> com.sleepycat.je.rep.RollbackException: (JE 5.0.104) 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  Node 
> nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
>  must rollback 3 commits to the earliest point indicated by transaction 
> id=-142 time=2016-04-08 00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to 
> rejoin the replication group. All existing ReplicatedEnvironment handles must 
> be closed and reinstantiated.  Log files were truncated to file 0x0, offset 
> 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or 
> abort. Must run recovery by re-opening Environment handles Environment is 
> invalid and must be closed.
>         at 
> com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupHardRecovery(ReplicaFeederSyncup.java:650)
>  ~[je-5.0.104.jar:5.0.104]
>         at 
> com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:341)
>  ~[je-5.0.104.jar:5.0.104]
>         at 
> com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:148)
>  ~[je-5.0.104.jar:5.0.104]
>         at 
> com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:605) 
> ~[je-5.0.104.jar:5.0.104]
>         at 
> com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:396)
>  ~[je-5.0.104.jar:5.0.104]
>         at 
> com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:332) 
> ~[je-5.0.104.jar:5.0.104]
>         at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1506) 
> ~[je-5.0.104.jar:5.0.104]
> {noformat}
> The above causes the below
> {noformat}
> BROKER-20 DEBUG [HttpManagement-http-74] 
> o.a.q.s.m.p.f.ExceptionHandlingFilter Exception in servlet 
> '/api/latest/virtualhostnode/nodetestInFlightTransactionsWhilstMajorityIsLost10004':
> org.apache.qpid.server.util.ConnectionScopedRuntimeException: Environment is 
> restarting
>         at 
> org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.getEnvironment(ReplicatedEnvironmentFacade.java:1342)
>  ~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.getNodes(ReplicatedEnvironmentFacade.java:965)
>  ~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.virtualhostnode.berkeleydb.BDBHAVirtualHostNodeImpl.activate(BDBHAVirtualHostNodeImpl.java:361)
>  ~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.virtualhostnode.AbstractVirtualHostNode.doActivate(AbstractVirtualHostNode.java:160)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_80]
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> ~[na:1.7.0_80]
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.7.0_80]
>         at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80]
>         at 
> org.apache.qpid.server.model.AbstractConfiguredObject.attainState(AbstractConfiguredObject.java:1308)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.model.AbstractConfiguredObject.attainState(AbstractConfiguredObject.java:1287)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.model.AbstractConfiguredObject.attainStateIfOpenedOrReopenFailed(AbstractConfiguredObject.java:1271)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.model.AbstractConfiguredObject.access$1700(AbstractConfiguredObject.java:80)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.model.AbstractConfiguredObject$15.execute(AbstractConfiguredObject.java:1519)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.model.AbstractConfiguredObject$15.execute(AbstractConfiguredObject.java:1455)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.model.AbstractConfiguredObject$2.execute(AbstractConfiguredObject.java:561)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.model.AbstractConfiguredObject$2.execute(AbstractConfiguredObject.java:554)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.configuration.updater.TaskExecutorImpl$TaskLoggingWrapper.execute(TaskExecutorImpl.java:270)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at 
> org.apache.qpid.server.configuration.updater.TaskExecutorImpl$CallableWrapper$1.run(TaskExecutorImpl.java:342)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.7.0_80]
>         at javax.security.auth.Subject.doAs(Subject.java:356) ~[na:1.7.0_80]
>         at 
> org.apache.qpid.server.configuration.updater.TaskExecutorImpl$CallableWrapper.call(TaskExecutorImpl.java:335)
>  ~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_80]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_80]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_80]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
> {noformat}
> Test MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost sporadically 
> fails because of this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to