Hi Folks,

I have been looking into some issues with TomEE using Websphere MQ,
specifically issues where XA resources are returning a -7 error during a
transaction commit when the system is under load. The -7 error indicates
that the resource is closed.

The result of this error seems to be resources staying in the system
somehow associated with the connection, and subsequently, these can't be
enlisted in transactions (because they are closed).

The stacktrace is like this, and happens over and over again as the server
attempts to process more messages from the queue.

WARN  Transaction- Unable to enlist XAResource
org.apache.geronimo.transaction.manager.WrapperNamedXAResource@3dd56e90,
errorCode: -7
javax.transaction.xa.XAException: The method 'xa_start' has failed with
errorCode '-7' due to the resource being closed.
        at com.ibm.mq.jmqi.JmqiXAResource.start(JmqiXAResource.java:946)
~[com.ibm.mq.jmqi.jar:7.5.0.5 - p750-005-150424]
        at com.ibm.mq.connector.xa.XARWrapper.start(XARWrapper.java:581)
~[com.ibm.mq.connector.jar:7.5.0.5-p750-005-150424]
        at
org.apache.geronimo.transaction.manager.WrapperNamedXAResource.start(WrapperNamedXAResource.java:111)
~[geronimo-transaction-3.1.4.jar:3.1.4]
        at
org.apache.geronimo.transaction.manager.TransactionImpl.enlistResource(TransactionImpl.java:209)
[geronimo-transaction-3.1.4.jar:3.1.4]
        at
org.apache.geronimo.connector.outbound.TransactionEnlistingInterceptor.getConnection(TransactionEnlistingInterceptor.java:60)
[geronimo-connector-3.1.4.jar:3.1.4]
        at
org.apache.geronimo.connector.outbound.TransactionCachingInterceptor.getConnection(TransactionCachingInterceptor.java:101)
[geronimo-connector-3.1.4.jar:3.1.4]
        at
org.apache.geronimo.connector.outbound.ConnectionHandleInterceptor.getConnection(ConnectionHandleInterceptor.java:43)
[geronimo-connector-3.1.4.jar:3.1.4]
        at
org.apache.geronimo.connector.outbound.TCCLInterceptor.getConnection(TCCLInterceptor.java:39)
[geronimo-connector-3.1.4.jar:3.1.4]
        at
org.apache.geronimo.connector.outbound.ConnectionTrackingInterceptor.getConnection(ConnectionTrackingInterceptor.java:66)
[geronimo-connector-3.1.4.jar:3.1.4]
        at
org.apache.geronimo.connector.outbound.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:81)
[geronimo-connector-3.1.4.jar:3.1.4]
        at
com.ibm.mq.connector.outbound.ConnectionFactoryImpl.createManagedJMSConnection(ConnectionFactoryImpl.java:194)
[com.ibm.mq.connector.jar:7.5.0.5-p750-005-150424]
        at
com.ibm.mq.connector.outbound.ConnectionFactoryImpl.createConnectionInternal(ConnectionFactoryImpl.java:153)
[com.ibm.mq.connector.jar:7.5.0.5-p750-005-150424]
        at
com.ibm.mq.connector.outbound.ConnectionFactoryImpl.createConnection(ConnectionFactoryImpl.java:138)
[com.ibm.mq.connector.jar:7.5.0.5-p750-005-150424]
        at
com.ibm.mq.connector.outbound.ConnectionFactoryImpl.createConnection(ConnectionFactoryImpl.java:123)
[com.ibm.mq.connector.jar:7.5.0.5-p750-005-150424]

The issue itself bears some resemblance to these posts:

http://tomee-openejb.979440.n4.nabble.com/Unable-to-enlist-XAResource-error-td4666552.html
http://tomee-openejb.979440.n4.nabble.com/Does-tomEE-issue-double-rollbacks-td4666090.html
http://tomee-openejb.979440.n4.nabble.com/errorcode-100-when-using-websphere-MQ-with-tomee-td4666519.html

com.ibm.mq.jmqi.JmqiXAResource has a isClosed field, and this is set when
close_internal() is called. I tracked this call using AspectJ, and it
appeared to happen here:

com.ibm.mq.jmqi.JmqiXAResource.close_internal(JmqiXAResource.java:296),
com.ibm.mq.jmqi.JmqiXAResource.close(JmqiXAResource.java:423),
com.ibm.mq.jmqi.JmqiXAResource.close(JmqiXAResource.java:410),
com.ibm.msg.client.wmq.internal.WMQXASession.close(WMQXASession.java:163),
com.ibm.msg.client.jms.internal.JmsSessionImpl.close(JmsSessionImpl.java:375),
com.ibm.msg.client.jms.internal.JmsSessionImpl.close(JmsSessionImpl.java:303),
com.ibm.mq.jms.MQSession.close(MQSession.java:298),
com.ibm.mq.connector.outbound.ManagedConnectionImpl.destroy(ManagedConnectionImpl.java:298),
org.apache.geronimo.connector.outbound.MCFConnectionInterceptor.returnConnection(MCFConnectionInterceptor.java:67),
org.apache.geronimo.connector.outbound.XAResourceInsertionInterceptor.returnConnection(XAResourceInsertionInterceptor.java:47),
org.apache.geronimo.connector.outbound.AbstractSinglePoolConnectionInterceptor.internalReturn(AbstractSinglePoolConnectionInterceptor.java:148),
org.apache.geronimo.connector.outbound.AbstractSinglePoolConnectionInterceptor.returnConnection(AbstractSinglePoolConnectionInterceptor.java:129),
org.apache.geronimo.connector.outbound.TransactionEnlistingInterceptor.returnConnection(TransactionEnlistingInterceptor.java:118),
org.apache.geronimo.connector.outbound.TransactionCachingInterceptor.returnConnection(TransactionCachingInterceptor.java:119),
org.apache.geronimo.connector.outbound.ConnectionHandleInterceptor.returnConnection(ConnectionHandleInterceptor.java:71),
org.apache.geronimo.connector.outbound.TCCLInterceptor.returnConnection(TCCLInterceptor.java:50),
org.apache.geronimo.connector.outbound.ConnectionTrackingInterceptor.returnConnection(ConnectionTrackingInterceptor.java:91),
org.apache.openejb.resource.AutoConnectionTracker.setEnvironment(AutoConnectionTracker.java:62),
org.apache.geronimo.connector.outbound.ConnectionTrackingInterceptor.getConnection(ConnectionTrackingInterceptor.java:65),
org.apache.geronimo.connector.outbound.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:81),
com.ibm.mq.connector.outbound.ConnectionFactoryImpl.createManagedJMSConnection(ConnectionFactoryImpl.java:194),
com.ibm.mq.connector.outbound.ConnectionFactoryImpl.createConnectionInternal(ConnectionFactoryImpl.java:153),
com.ibm.mq.connector.outbound.ConnectionFactoryImpl.createConnection(ConnectionFactoryImpl.java:138),
com.ibm.mq.connector.outbound.ConnectionFactoryImpl.createConnection(ConnectionFactoryImpl.java:123),

The interesting part here is the
org.apache.openejb.resource.AutoConnectionTracker.setEnvironment() call,
which is destroying connections where the proxy for the connection no
longer has any references. I.e. if you had some code that did this:

----
Connection conn = connectionFactory.createConnection(); // conn is actually
a proxy to the connection
// do work

conn = null; // or conn is out of scope
----

note that conn.close() is not called. The reference count to the proxy is
now zero, so it counts as abandoned, and the next call to
createConnection() destroys the associated connection. However, if all this
happens in a transacted method, the JmqiXAResource is still enlisted with
the transaction. So when the transaction goes to commit, it fails, because
the JmqiXAResource is closed. The resource is still associated with a
ManagedConnectionInfo object through ManagedConnectionInfo.xares
(WrapperNamedXAResource). xaResource (XARWrapper - part of the RAR).theXAR
(JmqiXAResource - part of the RAR). The ManagedConnectionInfo object
somehow comes back to the pool after the rollback, and gets used again,
with the TransactionImpl.enlistResource() call failing as above.

I suspect that we're seeing this under load as GC will be taking place more
often and calls to createConnection() will be happening more frequently.

The connection proxies themselves don't do much - only calls to finalize()
and clone() are caught. I propose this patch to allow the proxies to be
disabled (the current behaviour is still the default):
https://github.com/apache/tomee/pull/174 and welcome feedback.

Running this specific case with proxies disabled has yielded positive
results. I'll try and extend this and reproduce it in a test case as well.
I'm wondering if AutoConnectionTracker's use of ReferenceQueue is thread
safe. Do anyone have any thoughts on this?

Cheers

Jon

Reply via email to