[
https://issues.apache.org/jira/browse/QPID-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keith Wall updated QPID-7078:
-----------------------------
Summary: [Java Broker,HA] BDB HA VHN in master role designated as primary
can sporadically transit into unknown role after losing second replica node
(was: [Java Broker,HA] BDB HA VHN in master role designated as primary can
sporadically transit into unknown role after loosing second replica node)
> [Java Broker,HA] BDB HA VHN in master role designated as primary can
> sporadically transit into unknown role after losing second replica node
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: QPID-7078
> URL: https://issues.apache.org/jira/browse/QPID-7078
> Project: Qpid
> Issue Type: Bug
> Components: Java Broker
> Affects Versions: 0.32, qpid-java-6.0, qpid-java-6.0.1, qpid-java-6.1
> Reporter: Alex Rudyy
> Attachments:
> TEST-org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped.txt
>
>
> Failure of test
> TwoNodeTest#testDesignatedPrimaryContinuesAfterSecondaryStopped reviled an
> unexpected behavior of BDB JE when master node designated as primary
> suddenly transits into UNKNOWN role after shutting down of second replica
> node.
> The test failed as below:
> {noformat}
> testDesignatedPrimaryContinuesAfterSecondaryStopped(org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest)
> Time elapsed: 7.236 sec <<< ERROR!
> javax.jms.JMSException: Error registering consumer:
> org.apache.qpid.QpidException: Fail-over exception interrupted basic consume.
> at
> org.apache.qpid.client.AMQSession.registerConsumer(AMQSession.java:3093)
> at org.apache.qpid.client.AMQSession.access$400(AMQSession.java:94)
> at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2094)
> at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2069)
> at
> org.apache.qpid.client.AMQConnectionDelegate_8_0.executeRetrySupport(AMQConnectionDelegate_8_0.java:416)
> at
> org.apache.qpid.client.AMQConnection.executeRetrySupport(AMQConnection.java:737)
> at
> org.apache.qpid.client.failover.FailoverRetrySupport.execute(FailoverRetrySupport.java:90)
> at
> org.apache.qpid.client.AMQSession.createConsumerImpl(AMQSession.java:2067)
> at org.apache.qpid.client.AMQSession.createConsumer(AMQSession.java:989)
> at
> org.apache.qpid.client.AMQConnection.retrieveVirtualHostPropertiesIfNecessary(AMQConnection.java:809)
> at
> org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:796)
> at
> org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:771)
> at
> org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:765)
> at
> org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:88)
> at
> org.apache.qpid.test.utils.QpidBrokerTestCase.assertProducingConsuming(QpidBrokerTestCase.java:1256)
> at
> org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped(TwoNodeTest.java:108)
> Caused by: org.apache.qpid.client.failover.FailoverException: Failing over
> about to start
> at
> org.apache.qpid.client.AMQProtocolHandler.notifyFailoverStarting(AMQProtocolHandler.java:434)
> at
> org.apache.qpid.client.AMQProtocolHandler$1.run(AMQProtocolHandler.java:287)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> On broker side a transition into UNKNOWN state occurred as below:
> {noformat}
> 10:15:44,279 B-10000 DEBUG
> [Group-Change-Learner:test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
> o.a.q.s.s.b.r.DatabasePinger Ping transaction completed
> 10:15:44,279 B-10000 DEBUG [IO-/127.0.0.1:58662] o.a.q.s.p.v.BrokerDecoder
> Frame handled in 1344 ms.
> 10:15:44,279 B-10000 INFO [MASTER
> nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001(1)]
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade The node
> 'test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001' state is
> UNKNOWN
> 10:15:44,279 B-10000 DEBUG
> [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Received BDB event, new BDB state
> UNKNOWN Facade state : OPEN
> 10:15:44,279 B-10000 INFO
> [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
> o.a.q.s.v.b.BDBHAVirtualHostNodeImpl Received BDB event indicating
> transition from state MASTER to UNKNOWN for
> nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001
> 10:15:44,280 B-10000 DEBUG
> [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
> o.a.q.s.c.u.TaskExecutorImpl Performing Task['close' on
> 'BDBHAVirtualHostImpl [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]']
> 10:15:44,281 B-10000 DEBUG
> [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
> o.a.q.s.m.AbstractConfiguredObject Closing BDBHAVirtualHostImpl : test
> 2016-02-17 10:15:44,281 B-10000 DEBUG
> [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
> o.a.q.s.v.AbstractVirtualHost Closing connection registry :1 connections.
> 10:15:44,282 B-10000 DEBUG
> [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
> o.a.q.s.c.u.TaskExecutorImpl Task['close' on 'BDBHAVirtualHostImpl
> [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]'] performed successfully
> with result: null
> 10:15:44,283 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl
> Performing Task['close' on '/127.0.0.1:58662(guest)']
> 10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.m.AbstractConfiguredObject
> Closing AMQPConnection_0_8 : [1] 127.0.0.1:58662
> 10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl
> Task['close' on '/127.0.0.1:58662(guest)'] performed successfully with
> result: null
> {noformat}
> The transition into UNKNOWN state should not happen as MASTER node is
> designated as primary. The exhibit behavior indicates about BDB JE bug.
> It is unclear whether JE Environment can recover from this unexpected flip
> into UNKNOWN state. If JE can recover, then on next transition into MASTER
> VHN should recover VH and connected applications can continue as usual. If JE
> can not recover, then BDB HA VHN will not recover automatically from this
> conditions, as we do not restart the environment on MasterUnknownException.
> The operator intervention would be required to restart BDB HA VHN.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]