[ 
https://issues.apache.org/jira/browse/QPID-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173844#comment-15173844
 ] 

Keith Wall edited comment on QPID-7078 at 3/1/16 2:48 PM:
----------------------------------------------------------

This appears to be a JE defect.  We don't think that a node that is designated 
primary should ever flip to the unknown role. If another occurrence is seen, we 
will see if we can change Qpid to work around.


was (Author: k-wall):
This appears to be a JE defect.  If another occurrence is seen, we will see if 
we can change Qpid to work around.

> [Java Broker,HA] BDB HA VHN in master role designated as primary can 
> sporadically transit into unknown role after losing second replica node
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: QPID-7078
>                 URL: https://issues.apache.org/jira/browse/QPID-7078
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: 0.32, qpid-java-6.0, qpid-java-6.0.1, qpid-java-6.1
>            Reporter: Alex Rudyy
>         Attachments: 
> TEST-org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped.txt
>
>
> Failure of test 
> TwoNodeTest#testDesignatedPrimaryContinuesAfterSecondaryStopped reviled an 
> unexpected behavior of  BDB JE when master node designated as primary 
> suddenly transits into UNKNOWN role after shutting down of second replica 
> node.
> The test failed as below:
> {noformat}
> testDesignatedPrimaryContinuesAfterSecondaryStopped(org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest)
>   Time elapsed: 7.236 sec  <<< ERROR!
> javax.jms.JMSException: Error registering consumer: 
> org.apache.qpid.QpidException: Fail-over exception interrupted basic consume.
>       at 
> org.apache.qpid.client.AMQSession.registerConsumer(AMQSession.java:3093)
>       at org.apache.qpid.client.AMQSession.access$400(AMQSession.java:94)
>       at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2094)
>       at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2069)
>       at 
> org.apache.qpid.client.AMQConnectionDelegate_8_0.executeRetrySupport(AMQConnectionDelegate_8_0.java:416)
>       at 
> org.apache.qpid.client.AMQConnection.executeRetrySupport(AMQConnection.java:737)
>       at 
> org.apache.qpid.client.failover.FailoverRetrySupport.execute(FailoverRetrySupport.java:90)
>       at 
> org.apache.qpid.client.AMQSession.createConsumerImpl(AMQSession.java:2067)
>       at org.apache.qpid.client.AMQSession.createConsumer(AMQSession.java:989)
>       at 
> org.apache.qpid.client.AMQConnection.retrieveVirtualHostPropertiesIfNecessary(AMQConnection.java:809)
>       at 
> org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:796)
>       at 
> org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:771)
>       at 
> org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:765)
>       at 
> org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:88)
>       at 
> org.apache.qpid.test.utils.QpidBrokerTestCase.assertProducingConsuming(QpidBrokerTestCase.java:1256)
>       at 
> org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped(TwoNodeTest.java:108)
> Caused by: org.apache.qpid.client.failover.FailoverException: Failing over 
> about to start
>       at 
> org.apache.qpid.client.AMQProtocolHandler.notifyFailoverStarting(AMQProtocolHandler.java:434)
>       at 
> org.apache.qpid.client.AMQProtocolHandler$1.run(AMQProtocolHandler.java:287)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> On broker side a transition into UNKNOWN state occurred as below:
> {noformat}
> 10:15:44,279 B-10000 DEBUG 
> [Group-Change-Learner:test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
>  o.a.q.s.s.b.r.DatabasePinger Ping transaction completed
> 10:15:44,279 B-10000 DEBUG [IO-/127.0.0.1:58662] o.a.q.s.p.v.BrokerDecoder 
> Frame handled in 1344 ms.
> 10:15:44,279 B-10000 INFO  [MASTER 
> nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001(1)] 
> o.a.q.s.s.b.r.ReplicatedEnvironmentFacade The node 
> 'test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001' state is 
> UNKNOWN
> 10:15:44,279 B-10000 DEBUG 
> [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
>  o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Received BDB event, new BDB state 
> UNKNOWN Facade state : OPEN
> 10:15:44,279 B-10000 INFO  
> [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
>  o.a.q.s.v.b.BDBHAVirtualHostNodeImpl Received BDB event indicating 
> transition from state MASTER to UNKNOWN for 
> nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001
> 10:15:44,280 B-10000 DEBUG 
> [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
>  o.a.q.s.c.u.TaskExecutorImpl Performing Task['close' on 
> 'BDBHAVirtualHostImpl [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]']
> 10:15:44,281 B-10000 DEBUG 
> [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
>  o.a.q.s.m.AbstractConfiguredObject Closing BDBHAVirtualHostImpl : test
> 2016-02-17 10:15:44,281 B-10000 DEBUG 
> [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
>  o.a.q.s.v.AbstractVirtualHost Closing connection registry :1 connections.
> 10:15:44,282 B-10000 DEBUG 
> [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
>  o.a.q.s.c.u.TaskExecutorImpl Task['close' on 'BDBHAVirtualHostImpl 
> [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]'] performed successfully 
> with result: null
> 10:15:44,283 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl 
> Performing Task['close' on '/127.0.0.1:58662(guest)']
> 10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.m.AbstractConfiguredObject 
> Closing AMQPConnection_0_8 : [1] 127.0.0.1:58662
> 10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl 
> Task['close' on '/127.0.0.1:58662(guest)'] performed successfully with 
> result: null
> {noformat}
> The transition into UNKNOWN state should not happen as MASTER node is 
> designated as primary. The exhibit behavior indicates about BDB JE bug.
> It is unclear whether JE Environment can recover from this unexpected flip 
> into UNKNOWN state. If JE can recover, then on next transition into MASTER 
> VHN should recover VH and connected applications can continue as usual. If JE 
> can not recover, then BDB HA VHN will not recover automatically from this 
> conditions, as we do not restart the environment on MasterUnknownException. 
> The operator intervention would be required to restart BDB HA VHN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to