[jira] [Commented] (AMQ-7455) Blocking XA Periodic Recovery and Container Operations when deploying ActiveMQ RA 5.15.x to JBoss EAP 6.4

Gary Tully (Jira) Tue, 31 Mar 2020 05:10:53 -0700


    [ 
https://issues.apache.org/jira/browse/AMQ-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071719#comment-17071719
 ]


Gary Tully commented on AMQ-7455:
---------------------------------

note the latest change for AMQ-5790 that retains a single connection for the 
rar deployment, this avoids the connection churn.

> Blocking XA Periodic Recovery and Container Operations when deploying 
> ActiveMQ RA 5.15.x to JBoss EAP 6.4
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-7455
>                 URL: https://issues.apache.org/jira/browse/AMQ-7455
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: RAR
>    Affects Versions: 5.15.11, 5.15.12
>            Reporter: Markus
>            Priority: Major
>              Labels: newbie, performance
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Scenario*: deploy ActiveMQ RA to EAP6.4 with XA enabled DataSources for JDBC 
> and ActiveMQ Artemis as JMS broker in a failover config (active/passive) and 
> run several applications that use JMS and JDBC datasources. (In our scenario, 
> there are 4 XA JDBC DataSources and 42 deployed WAR files). ActiveMQ resource 
> adapter is used as the Artemis one is not compatible to this EAP version.
> JBoss EAP and Artemis are running on the same host in the examples below, but 
> this does not matter. Artemis is running with example 
> HA-by-network-replication settings on localhost:61616 and localhost:61617.
> The resource adapter settings on JBoss EAP side and on client side 
> (especially properties of the broker URIs for ActiveMQ's connection factory) 
> are exactly as described in [Red Hat's reference documentation for HA 
> config|https://access.redhat.com/documentation/en-us/red_hat_jboss_a-mq/6.1/html-single/integrating_with_jboss_enterprise_application_platform/index]:
>  * Add {{?jms.rmIdFromConnectionId=true}} to the connection URI
>  * {{<is-same-rm-override>false</is-same-rm-override>}} in standalone.xml
> *Observed Symptoms*: On a regular basis, _undeployment of WAR files take 
> minutes (around 10) instead of seconds_, blocking other container management 
> operations during this timeframe. This behavior can be traced back to running 
> ActiveMQ RA and Artemis as JMS broker. If running the same WAR files and same 
> config, but using JBoss EAP's internal HornetQ broker, these delays do not 
> occur and undeployments take mere seconds.
> *Further Investigation*: This only affects JavaEE modules that define at 
> least one MDB (message driven bean). During shutdown of the beans, the 
> container management thread is blocked while de-registering the bean's XA 
> Resource, see following stack trace taken while a simple module is being 
> undeployed for reference:
> {noformat}
> "ServerService Thread Pool -- 277" #3709 prio=5 os_prio=0 
> tid=0x000000001d6e0000 nid=0x4df4 in Object.wait() [0x000000007c41e000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at java.lang.Object.wait(Object.java:502)
>       at 
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.waitForScanState(XARecoveryModule.java:935)
>       at 
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.removeXAResourceRecoveryHelper(XARecoveryModule.java:92)
>       - locked <0x000000008162bd38> (a 
> java.util.concurrent.atomic.AtomicInteger)
>       at 
> com.arjuna.ats.jbossatx.jta.RecoveryManagerService.removeXAResourceRecovery(RecoveryManagerService.java:117)
>       at 
> org.jboss.jca.core.tx.jbossts.XAResourceRecoveryRegistryImpl.removeXAResourceRecovery(XAResourceRecoveryRegistryImpl.java:63)
>       at org.jboss.jca.core.rar.EndpointImpl.deactivate(EndpointImpl.java:226)
>       at 
> org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponent.deactivate(MessageDrivenComponent.java:224)
>       at 
> org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponent.stop(MessageDrivenComponent.java:195)
> // and a couple of more elements
> {noformat}
> During these 10 minutes of blocking while the module is supposed to be 
> undeployed, EAP's log is literally spammed with thousands of these lines:
> {noformat}
> 11:19:43,640 INFO  [org.apache.activemq.transport.failover.FailoverTransport] 
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,643 INFO  [org.apache.activemq.transport.failover.FailoverTransport] 
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,647 INFO  [org.apache.activemq.transport.failover.FailoverTransport] 
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,657 INFO  [org.apache.activemq.transport.failover.FailoverTransport] 
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,661 INFO  [org.apache.activemq.transport.failover.FailoverTransport] 
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> 11:19:43,664 INFO  [org.apache.activemq.transport.failover.FailoverTransport] 
> (ActiveMQ Task-1) Successfully connected to tcp://localhost:61616
> {noformat}
> Only after these lines stop to be logged the undeployment process continues.
> *What happens during this time in ActiveMQ:* Obviously there are thousands of 
> connections established to Artemis during this time. Tracing back the code 
> path that leads to these calls points to EAP's Periodic XA Resource Recovery 
> thread, see the following full stack trace (taken at the same point in time):
> {noformat}
> "Periodic Recovery" #96 prio=5 os_prio=0 tid=0x000000002484a000 nid=0x1b28 
> waiting for monitor entry [0x0000000028dbe000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at 
> org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:623)
>       - locked <0x00000000cc69eeb8> (a java.lang.Object)
>       at 
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
>       at 
> org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)
>       at 
> org.apache.activemq.transport.ResponseCorrelator.request(ResponseCorrelator.java:86)
>       at 
> org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1392)
>       at 
> org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent(ActiveMQConnection.java:1486)
>       - locked <0x00000000cc69ff10> (a java.lang.Object)
>       at 
> org.apache.activemq.ActiveMQConnection.start(ActiveMQConnection.java:527)
>       at 
> org.apache.activemq.ra.ActiveMQResourceAdapter$2.newConnection(ActiveMQResourceAdapter.java:361)
>       at 
> org.apache.activemq.ra.ActiveMQResourceAdapter$2.getResourceManagerId(ActiveMQResourceAdapter.java:278)
>       at 
> org.apache.activemq.TransactionContext.isSameRM(TransactionContext.java:666)
>       at 
> org.apache.activemq.ra.ActiveMQResourceAdapter$2.isSameRM(ActiveMQResourceAdapter.java:261)
>       at 
> org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.isSameRM(XAResourceWrapperImpl.java:159)
>       at 
> com.arjuna.ats.internal.jta.recovery.arjunacore.RecoveryXids.isSameRM(RecoveryXids.java:143)
>       at 
> com.arjuna.ats.internal.jta.recovery.arjunacore.RecoveryXids.updateIfEquivalentRM(RecoveryXids.java:215)
>       at 
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.refreshXidScansForEquivalentXAResourceImpl(XARecoveryModule.java:796)
>       at 
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass(XARecoveryModule.java:570)
>       at 
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:181)
>       - locked <0x000000008162ba98> (a 
> com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule)
>       at 
> com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:747)
>       at 
> com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:375){noformat}
> From this stack trace I want to point out especially this line: 
> {{org.apache.activemq.TransactionContext.isSameRM()}}
> This method is part of the XA Resource Contract. Simply put the periodic 
> recovery process iterates over XA Resources and looks for dormant XA 
> transactions that need to be finished in some way or the other. Here it 
> checks different combinations of XA Resources to asses whether they came from 
> the same Resource Manager ({{.isSameRM()}}).
>  During this recovery process (which in the stack trace above is still in the 
> phase of gathering basic information and not actually working on XA 
> transactions) the deregistration of the message driven beans is blocked.
> *Analysis for the super long duration of recovery process*: The 
> {{XAResource}} instances that periodic recovery works on are created by 
> {{org.apache.activemq.ra.ActiveMQResourceAdapter.getXAResources(ActivationSpec[])}}.
>  The anonymous class that extends {{org.apache.activemq.TransactionContext}} 
> and that is instantiated here does basically the following steps for each 
> invocation of any of the XA Resource contract methods:
>  # Replace the current connection to Artemis with a freshly created one
>  # Call {{*super*}}
>  # Restore the current connection and close the new one
> Specifically for {{isSameRM}} this results in the following chain of events:
>  # Create a pristine connection to remote Artemis (in our case a 
> FailoverTransport)
>  # Call {{*super*.isSameRM(xaresource)}}
>  # If {{xaresource}} is {{*null*}} or is not an instance of 
> {{TransactionContext}}, return false 
> ({{org/apache/activemq/TransactionContext.java:658}})
>  # Get the current resource manager id and {{xaresource}}'s resource manager 
> id. This effectively calls {{*this*.getResourceManagerId()}}. As noted above, 
> this leads to the following calls:
>  # Replace the current connection to Artemis (established in step 1) with a 
> second pristine one
>  # Call {{*super*.getResourceManagerId()}}
>  # Call {{getResourceManagerId()}} on the connection of step 5
>  # In {{org/apache/activemq/ActiveMQConnection.java:1622}}, the resource 
> manager id effectively is the Connection Id in our setup (according to the 
> flag set in the connection URI).
>  # This id is _unique for each connection_ (see 
> {{org/apache/activemq/ActiveMQConnection.java:240}})
>  # Compare the resulting value with the resource manager id of xaresource and 
> return the result
>  # Close the connection to the JMS broker established in step 5 and set the 
> one established in step 1 as "current" connection
>  # Close the connection established in step 1 and reset current connection to 
> whatever was there before (in my understanding that is always {{*null*}} as 
> the instances are always being freshly created inside 
> {{ActiveMQResourceAdapter.getXAResources}})
> This is repeated for each pair of XA Resources, or rather Activation Specs, 
> that the recovery manager encounters. The log spamming stems from steps 1 and 
> 5 because each time a INFO message is logged tht tells us that a new 
> connection has been established to the server.
> This also happens when the recovery process wants to know if a JDBC 
> connection to our SQL server (which we have plenty of due to several modules 
> with a connection pooling data source each) is from the same resource manager 
> as a JMS connection to Artemis - which it obviously never is.
> *Problematic Aspects of this code*:
>  # It doesn't make sense to establish pristine connections to the remote 
> broker for each invocation of one of {{XAResource}}'s methods. The connection 
> setup is expensive with a plain non-failover URI as it is, but even more so 
> with a failover transport - this has been supported by tests and also the 
> code shows this.
>  # If there is the option to find an answer to {{isSameRM()}} without having 
> to do any work, this should be done as early as possible.
>  # If the connection factory is configured with {{rmIdFromConnectionId}} 
> enabled, then {{isSameRM()}} in my understanding will never return {{*true*}}.
> *Possible solutions for each of these aspects*:
>  # In order to change the basic pattern of 
> create-connection-then-do-something-then-close-connection that happens inside 
>  {{ActiveMQResourceAdapter.}}{{getXAResources}}{{()}}, a big portion of this 
> code would propably have to be re-written.
>  # In the current code base, the instanceof check (see step 3 above) can be 
> extraced and inlined into the anonymous inner class defined inside 
> {{ActiveMQResourceAdapter.}}{{getXAResources}}{{()}}
>  # I don't know about the semantic correctness of this, i.e. if this behavior 
> is desired; but if it is, and all the code I read strengthens this 
> assumption, then in this case this is another candidate for short-circuting 
> the expensive connection creation: if this config flag is set on the 
> connection factory, directly return {{*false*}}.
> So, basically, the solutions 2 and 3 somewhat mitigate the antipattern(?) 
> described in 1. And only this aspect is addressed by the pull request that is 
> linked to this ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (AMQ-7455) Blocking XA Periodic Recovery and Container Operations when deploying ActiveMQ RA 5.15.x to JBoss EAP 6.4

Reply via email to