[
https://issues.apache.org/jira/browse/AMQ-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071719#comment-17071719
]
Gary Tully commented on AMQ-7455:
---------------------------------
note the latest change for AMQ-5790 that retains a single connection for the
rar deployment, this avoids the connection churn.
> Blocking XA Periodic Recovery and Container Operations when deploying
> ActiveMQ RA 5.15.x to JBoss EAP 6.4
> ---------------------------------------------------------------------------------------------------------
>
> Key: AMQ-7455
> URL: https://issues.apache.org/jira/browse/AMQ-7455
> Project: ActiveMQ
> Issue Type: Bug
> Components: RAR
> Affects Versions: 5.15.11, 5.15.12
> Reporter: Markus
> Priority: Major
> Labels: newbie, performance
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *Scenario*: deploy ActiveMQ RA to EAP6.4 with XA enabled DataSources for JDBC
> and ActiveMQ Artemis as JMS broker in a failover config (active/passive) and
> run several applications that use JMS and JDBC datasources. (In our scenario,
> there are 4 XA JDBC DataSources and 42 deployed WAR files). ActiveMQ resource
> adapter is used as the Artemis one is not compatible to this EAP version.
> JBoss EAP and Artemis are running on the same host in the examples below, but
> this does not matter. Artemis is running with example
> HA-by-network-replication settings on localhost:61616 and localhost:61617.
> The resource adapter settings on JBoss EAP side and on client side
> (especially properties of the broker URIs for ActiveMQ's connection factory)
> are exactly as described in [Red Hat's reference documentation for HA
> config|https://access.redhat.com/documentation/en-us/red_hat_jboss_a-mq/6.1/html-single/integrating_with_jboss_enterprise_application_platform/index]:
> * Add {{?jms.rmIdFromConnectionId=true}} to the connection URI
> * {{<is-same-rm-override>false</is-same-rm-override>}} in standalone.xml
> *Observed Symptoms*: On a regular basis, _undeployment of WAR files take
> minutes (around 10) instead of seconds_, blocking other container management
> operations during this timeframe. This behavior can be traced back to running
> ActiveMQ RA and Artemis as JMS broker. If running the same WAR files and same
> config, but using JBoss EAP's internal HornetQ broker, these delays do not
> occur and undeployments take mere seconds.
> *Further Investigation*: This only affects JavaEE modules that define at
> least one MDB (message driven bean). During shutdown of the beans, the
> container management thread is blocked while de-registering the bean's XA
> Resource, see following stack trace taken while a simple module is being
> undeployed for reference:
> {noformat}
> "ServerService Thread Pool -- 277" #3709 prio=5 os_prio=0
> tid=0x000000001d6e0000 nid=0x4df4 in Object.wait() [0x000000007c41e000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.waitForScanState(XARecoveryModule.java:935)
> at
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.removeXAResourceRecoveryHelper(XARecoveryModule.java:92)
> - locked <0x000000008162bd38> (a
> java.util.concurrent.atomic.AtomicInteger)
> at
> com.arjuna.ats.jbossatx.jta.RecoveryManagerService.removeXAResourceRecovery(RecoveryManagerService.java:117)
> at
> org.jboss.jca.core.tx.jbossts.XAResourceRecoveryRegistryImpl.removeXAResourceRecovery(XAResourceRecoveryRegistryImpl.java:63)
> at org.jboss.jca.core.rar.EndpointImpl.deactivate(EndpointImpl.java:226)
> at
> org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponent.deactivate(MessageDrivenComponent.java:224)
> at
> org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponent.stop(MessageDrivenComponent.java:195)
> // and a couple of more elements
> {noformat}
> During these 10 minutes of blocking while the module is supposed to be
> undeployed, EAP's log is literally spammed with thousands of these lines:
> {noformat}
> 11:19:43,640 INFO [org.apache.activemq.transport.failover.FailoverTransport]
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,643 INFO [org.apache.activemq.transport.failover.FailoverTransport]
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,647 INFO [org.apache.activemq.transport.failover.FailoverTransport]
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,657 INFO [org.apache.activemq.transport.failover.FailoverTransport]
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,661 INFO [org.apache.activemq.transport.failover.FailoverTransport]
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,664 INFO [org.apache.activemq.transport.failover.FailoverTransport]
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> {noformat}
> Only after these lines stop to be logged the undeployment process continues.
> *What happens during this time in ActiveMQ:* Obviously there are thousands of
> connections established to Artemis during this time. Tracing back the code
> path that leads to these calls points to EAP's Periodic XA Resource Recovery
> thread, see the following full stack trace (taken at the same point in time):
> {noformat}
> "Periodic Recovery" #96 prio=5 os_prio=0 tid=0x000000002484a000 nid=0x1b28
> waiting for monitor entry [0x0000000028dbe000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Object.wait(Native Method)
> at
> org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:623)
> - locked <0x00000000cc69eeb8> (a java.lang.Object)
> at
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
> at
> org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)
> at
> org.apache.activemq.transport.ResponseCorrelator.request(ResponseCorrelator.java:86)
> at
> org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1392)
> at
> org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent(ActiveMQConnection.java:1486)
> - locked <0x00000000cc69ff10> (a java.lang.Object)
> at
> org.apache.activemq.ActiveMQConnection.start(ActiveMQConnection.java:527)
> at
> org.apache.activemq.ra.ActiveMQResourceAdapter$2.newConnection(ActiveMQResourceAdapter.java:361)
> at
> org.apache.activemq.ra.ActiveMQResourceAdapter$2.getResourceManagerId(ActiveMQResourceAdapter.java:278)
> at
> org.apache.activemq.TransactionContext.isSameRM(TransactionContext.java:666)
> at
> org.apache.activemq.ra.ActiveMQResourceAdapter$2.isSameRM(ActiveMQResourceAdapter.java:261)
> at
> org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.isSameRM(XAResourceWrapperImpl.java:159)
> at
> com.arjuna.ats.internal.jta.recovery.arjunacore.RecoveryXids.isSameRM(RecoveryXids.java:143)
> at
> com.arjuna.ats.internal.jta.recovery.arjunacore.RecoveryXids.updateIfEquivalentRM(RecoveryXids.java:215)
> at
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.refreshXidScansForEquivalentXAResourceImpl(XARecoveryModule.java:796)
> at
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass(XARecoveryModule.java:570)
> at
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:181)
> - locked <0x000000008162ba98> (a
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule)
> at
> com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:747)
> at
> com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:375){noformat}
> From this stack trace I want to point out especially this line:
> {{org.apache.activemq.TransactionContext.isSameRM()}}
> This method is part of the XA Resource Contract. Simply put the periodic
> recovery process iterates over XA Resources and looks for dormant XA
> transactions that need to be finished in some way or the other. Here it
> checks different combinations of XA Resources to asses whether they came from
> the same Resource Manager ({{.isSameRM()}}).
> During this recovery process (which in the stack trace above is still in the
> phase of gathering basic information and not actually working on XA
> transactions) the deregistration of the message driven beans is blocked.
> *Analysis for the super long duration of recovery process*: The
> {{XAResource}} instances that periodic recovery works on are created by
> {{org.apache.activemq.ra.ActiveMQResourceAdapter.getXAResources(ActivationSpec[])}}.
> The anonymous class that extends {{org.apache.activemq.TransactionContext}}
> and that is instantiated here does basically the following steps for each
> invocation of any of the XA Resource contract methods:
> # Replace the current connection to Artemis with a freshly created one
> # Call {{*super*}}
> # Restore the current connection and close the new one
> Specifically for {{isSameRM}} this results in the following chain of events:
> # Create a pristine connection to remote Artemis (in our case a
> FailoverTransport)
> # Call {{*super*.isSameRM(xaresource)}}
> # If {{xaresource}} is {{*null*}} or is not an instance of
> {{TransactionContext}}, return false
> ({{org/apache/activemq/TransactionContext.java:658}})
> # Get the current resource manager id and {{xaresource}}'s resource manager
> id. This effectively calls {{*this*.getResourceManagerId()}}. As noted above,
> this leads to the following calls:
> # Replace the current connection to Artemis (established in step 1) with a
> second pristine one
> # Call {{*super*.getResourceManagerId()}}
> # Call {{getResourceManagerId()}} on the connection of step 5
> # In {{org/apache/activemq/ActiveMQConnection.java:1622}}, the resource
> manager id effectively is the Connection Id in our setup (according to the
> flag set in the connection URI).
> # This id is _unique for each connection_ (see
> {{org/apache/activemq/ActiveMQConnection.java:240}})
> # Compare the resulting value with the resource manager id of xaresource and
> return the result
> # Close the connection to the JMS broker established in step 5 and set the
> one established in step 1 as "current" connection
> # Close the connection established in step 1 and reset current connection to
> whatever was there before (in my understanding that is always {{*null*}} as
> the instances are always being freshly created inside
> {{ActiveMQResourceAdapter.getXAResources}})
> This is repeated for each pair of XA Resources, or rather Activation Specs,
> that the recovery manager encounters. The log spamming stems from steps 1 and
> 5 because each time a INFO message is logged tht tells us that a new
> connection has been established to the server.
> This also happens when the recovery process wants to know if a JDBC
> connection to our SQL server (which we have plenty of due to several modules
> with a connection pooling data source each) is from the same resource manager
> as a JMS connection to Artemis - which it obviously never is.
> *Problematic Aspects of this code*:
> # It doesn't make sense to establish pristine connections to the remote
> broker for each invocation of one of {{XAResource}}'s methods. The connection
> setup is expensive with a plain non-failover URI as it is, but even more so
> with a failover transport - this has been supported by tests and also the
> code shows this.
> # If there is the option to find an answer to {{isSameRM()}} without having
> to do any work, this should be done as early as possible.
> # If the connection factory is configured with {{rmIdFromConnectionId}}
> enabled, then {{isSameRM()}} in my understanding will never return {{*true*}}.
> *Possible solutions for each of these aspects*:
> # In order to change the basic pattern of
> create-connection-then-do-something-then-close-connection that happens insideĀ
> {{ActiveMQResourceAdapter.}}{{getXAResources}}{{()}}, a big portion of this
> code would propably have to be re-written.
> # In the current code base, the instanceof check (see step 3 above) can be
> extraced and inlined into the anonymous inner class defined inside
> {{ActiveMQResourceAdapter.}}{{getXAResources}}{{()}}
> # I don't know about the semantic correctness of this, i.e. if this behavior
> is desired; but if it is, and all the code I read strengthens this
> assumption, then in this case this is another candidate for short-circuting
> the expensive connection creation: if this config flag is set on the
> connection factory, directly return {{*false*}}.
> So, basically, the solutions 2 and 3 somewhat mitigate the antipattern(?)
> described in 1. And only this aspect is addressed by the pull request that is
> linked to this ticket.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)