[
https://issues.apache.org/jira/browse/ARTEMIS-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justin Bertram resolved ARTEMIS-2048.
-------------------------------------
Resolution: Not A Bug
Suspending a process at the OS level is a unique operation that doesn't mimic a
real-world use-case (e.g. a hardware or software crash), and it's not something
I would expect a administrator to do during normal operation. Therefore, I
don't believe that issues found for this use-case merit real investigation. If
you can demonstrate an issue with a real-world use-case please re-open this
issue with an explanation and steps to reproduce. Thanks!
> JCA RA does not failover to backup until TCP connect fails
> ----------------------------------------------------------
>
> Key: ARTEMIS-2048
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2048
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Affects Versions: 2.6.2
> Environment: Latest Payara (Full 182)
> JDK 1.8
> Windows machine (same on 7 x64 and 10 x64)
> Reporter: Jozef Tomek
> Priority: Major
> Labels: Failover, HA, JCA, RAR
>
> In cluster configuration with HA replication and UDP broadcast discovery,
> when both master and backup are properly started and then *process for master
> node is suspended on OS level (Windows)*, Artemis JCA resource adapter
> implementation does not properly recognize live being stuck and will not
> failover to backup until the moment when TCP connections to master will start
> to get refused.
>
> If cluster connection on nodes is configured to use low enough timeouts,
> backup node is able to recognize the problem in meaningful time and become a
> live. JCA RA however will not connect to now new live for several minutes.
> It's because calls to
> {code:java}
> 1094: createConnector()
> 1096: openTransportConnection(liveConnector){code}
> in
> {code:java}
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(){code}
>
> will not return null (which would be the signal to try to do failover) and
> thus attempt to communicate with stuck master will fail later at
> {code:java}
> 911: clientProtocolManager.checkForFailover(liveNodeID){code}
> in
> {code:java}
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnection(){code}
>
> which causes errors when trying to use connection to broker (both explicit
> usage and MDBs).
>
> Most of the time, JCA adapter eventually recognizes live not being there, do
> a failover and everything starts working again.
> Several times, with my (other) prototype app, I was however able to get
> adapter stuck in a way that, even though slave (now live) was running just
> fine, either:
> * failover happened but not for MDBs somehow - app could explicitly publish
> messages (get new usable connection from pool), but MDBs were not consuming
> from queues anymore
> * failover did not happen at all and both publishing and consuming was not
> working anymore
> For this I however don't have reliable reproduction steps yet.
> The theory about TCP connections is supported by doing telnets to suspended
> master's port. For several minutes after suspend, telnet can connect just
> fine and it changes exactly when I see messages in server logs about doing
> failover to backup.
>
> I've prepared small test app, having REST api to publish message to a queue
> (use included Swagger UI pages) and MDB consuming from the queue.
> On below link you can find source code of the app, scripts for creating
> master and slave brokers locally, parts of broker.xml config files with
> required config, resources required to setup Payara. Also patch tracking
> changes I've made to artemis RA & RAR projects code to make it to run in
> Payara
> [https://drive.google.com/open?id=11DNBCLKfAwttfibDw0Ckm_mVVhXP2JiR]
> (app needs "test.input" addesss+queue created beforehand, since MDB consumer
> does not create it automatically, and sets log level for
> "org.apache.activemq.artemis" to ALL)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact