Marc created ARTEMIS-5806:
-----------------------------
Summary: Message loss due to XA session rollback after broker
restart
Key: ARTEMIS-5806
URL: https://issues.apache.org/jira/browse/ARTEMIS-5806
Project: Artemis
Issue Type: Bug
Components: Broker
Affects Versions: 2.44.0, 2.40.0
Reporter: Marc
Attachments: MessageLossAfterRestart.png
In our setup, an MDB deployed in an Oracle WebLogic container connects to an
ActiveMQ Artemis broker using XA transactions. To receive messages, the
WebLogic MDB framework repeatedly polls by opening an XA transaction
({{{}xaStart{}}}), performing a {{{}receive(timeout){}}}, and then closing the
transaction ({{{}xaEnd{}}}). If a message was received, the transaction is
prepared and committed, otherwise rollbacked.
During a graceful broker shudown, all active transactions and sessions on the
broker are closed. That part works as expected. However, after the restart we
encounter a problematic behavior:
The MDB begins polling again ({{{}xaStart{}}} + {{{}receive(timeout){}}}).
Before the receive() timeouts, in parallel the WebLogic JTA framework tries to
finish the open transaction (started before the shutdown). This is done in the
same session as the MDB polling. Since that transaction no longer exists on the
broker, {{xaEnd}} fails with {_}"Cannot find suspended transaction to end"{_}.
WebLogic JTA forces a {{{}xaRollback{}}}, which also fails with {_}"Cannot find
xid in resource manager"{_}. On the broker side, the session is rollbacked
(see:
[ServerSessionImpl.java#L1627|https://github.com/apache/artemis/blob/fa1da6e6301fd89f7ec6dcdb98fd4366597082fa/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java#L1627][)|https://github.com/apache/artemis/blob/fa1da6e6301fd89f7ec6dcdb98fd4366597082fa/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java#L1627].
The session rollback will cancel all open transactions in the session,
including the ongoing MDB polling transaction.
The real problem occurs afterwards, if a new message is produced and ready to
be delivered to the MDB poller ( receive(timeout)). Artemis delivers the
message, the MDB poller tries to end ({{{}xaEnd{}}}) the transaction. Because
the transaction was already removed during the previous session rollback, this
results in {_}"Cannot find suspended transaction to end"{_}. The MDB poller
will force a global rollback, it drops the message and attempts to roll back on
Artemis broker, which also fails ({_}"Cannot find xid in resource manager"{_}).
As a result, on the Artemis broker the message is lost: it is removed from the
queue, and there are no open prepared transaction for it anymore.
Here is a short version of the flow (A simple sequence diagram is attached as
well):
{code:java}
xaStart(xid1) (session1)
receive()
— restart broker —
xaStart(xid2) (session2)
receive()
xaEnd(xid1) (session2)
— Cannot find suspended transaction to end
xaRollback(xid1) (session2)
— Cannot find xid in resource manager--- removes remove xid1 & all xids in
session
(including xid2)
message — receive with xid2
xaEnd(xid2)
— Cannot find suspended transaction to end
xaRollback(xid2)
— Cannot find xid in resource manager
message dropped due to exception, message no longer on queue and no transaction
on artemis left
{code}
To reproduce this behavior, I adapted the XA receive example in a fork:
[https://github.com/leisma/activemq-artemis-examples/commit/61deb9832eefeda360ff3207b3ad8e56c4ea2aa6\|https://github.com/leisma/activemq-artemis-examples/commit/61deb9832eefeda360ff3207b3ad8e56c4ea2aa6%5C]
(You need to run a broker separately to execute it)
I’m not sure whether Artemis implicitly assumes that only one XA transaction
may exist per session. I could not find clear guidance in the JTA specification
or other documentation regarding how XA transactions should behave in this
scenario.
Is this the expected behavior?
Or would it be possible for Artemis to check whether a session still contains
active transactions before performing a rollback, which would prevent the
message loss we are seeing?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]