[
https://issues.apache.org/jira/browse/QPID-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Ritchie reassigned QPID-1871:
------------------------------------
Assignee: Martin Ritchie
> During Rollback Client Rejects Message after sending TxRollback
> ---------------------------------------------------------------
>
> Key: QPID-1871
> URL: https://issues.apache.org/jira/browse/QPID-1871
> Project: Qpid
> Issue Type: Bug
> Components: Java Broker, Java Client
> Affects Versions: M4, 0.5
> Reporter: Martin Ritchie
> Assignee: Martin Ritchie
>
> Summary:
> See QPID-1864 for annotated log output.
> The log output is from a run with the Java broker, but highlights that the
> client dispatcher thread is not synchronized with the main thread during
> rollback.
> As a result the main thread sends the TxRollback before the Dispatcher has
> sent its Reject message. This results, on the java broker, in the unrejected
> message being redelivered, which may be out of order depending on what other
> messages have been released on the message queue.
> If we are to continue to rely on the dispatcher thread rejecting/releasing
> the message it is currently processing (i.e. the message that is neither in
> the _queue preDispatchQueue nor the _synchronousQueue for receiver delivery)
> then we will need to synchronize with the main thread's rollback/recover
> calls so that the dispatcher can finish processing its message before the
> rollback/recover completes.
> The message that the Dispatcher thread has can be seen in AMQSession
> L:2866:dispatchMessage().
> On Rollback we stop the dispatcher (L2763) which can result in the dispatcher
> thread stopping on L2877 and holding on the the message it is in the middle
> of delivery. More likely during recover the dispatcher will block on the lock
> L2870.
> When the dispatcher is restarted (L2792) it is then free to reject its
> message. However, the thread that restarted the dispatcher's next call is to
> send the rollback command(L1553) Which is where the race condition occurs.
> Potential Fix:
> Message Rejection should be performed BEFORE we stop the dispatcher.
> On L:2825 we remove the message from the _queue (preDispatchQueue) and then
> potentailly sit on the message L:2877 when we get stopped.
> If the reject call in L:2888 was before the wait then we could reject the
> message rather than sit on it.
> Note: Now that I look at this a bit more the rollback (L2754) code looks to
> be over synchronized. I'm not sure the dispatcher will actually ever stop on
> the wait L2877 during rollback as the dispatcher is stopped and started again
> inside the one syncronisation which would prevent the dispatcher getting to
> the wait. So will more likely block on the sync L2870
> Moving the setConnectionStopped calls out of the sync block along and
> ensuring that the _rollbackMark is updated before the connection is stopped
> then we should ok.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]