[ 
https://issues.apache.org/jira/browse/QPID-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Ritchie reassigned QPID-1871:
------------------------------------

    Assignee: Martin Ritchie

> During Rollback Client Rejects Message after sending TxRollback
> ---------------------------------------------------------------
>
>                 Key: QPID-1871
>                 URL: https://issues.apache.org/jira/browse/QPID-1871
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker, Java Client
>    Affects Versions: M4, 0.5
>            Reporter: Martin Ritchie
>            Assignee: Martin Ritchie
>
> Summary:
> See QPID-1864 for annotated log output.
> The log output is from a run with the Java broker, but highlights that the 
> client dispatcher thread is not synchronized with the main thread during 
> rollback.
> As a result the main thread sends the TxRollback before the Dispatcher has 
> sent its Reject message. This results, on the java broker, in the unrejected 
> message being redelivered, which may be out of order depending on what other 
> messages have been released on the message queue.
> If we are to continue to rely on the dispatcher thread rejecting/releasing 
> the message it is currently processing (i.e. the message that is neither in 
> the _queue preDispatchQueue nor the _synchronousQueue for receiver delivery) 
> then we will need to synchronize with the main thread's rollback/recover 
> calls so that the dispatcher can finish processing its message before the 
> rollback/recover completes.
> The message that the Dispatcher thread has can be seen in AMQSession 
> L:2866:dispatchMessage(). 
> On Rollback we stop the dispatcher (L2763) which can result in the dispatcher 
> thread stopping on L2877 and holding on the the message it is in the middle 
> of delivery. More likely during recover the dispatcher will block on the lock 
> L2870.
> When the dispatcher is restarted (L2792) it is then free to reject its 
> message. However, the thread that restarted the dispatcher's next call is to 
> send the rollback command(L1553) Which is where the race condition occurs.
> Potential Fix:
> Message Rejection should be performed BEFORE we stop the dispatcher.
> On L:2825 we remove the message from the _queue (preDispatchQueue) and then 
> potentailly sit on the message L:2877 when we get stopped.
> If the reject call in L:2888 was before the wait then we could reject the 
> message rather than sit on it.
> Note: Now that I look at this a bit more the rollback (L2754) code looks to 
> be over synchronized. I'm not sure the dispatcher will actually ever stop on 
> the wait L2877 during rollback as the dispatcher is stopped and started again 
> inside the one syncronisation which would prevent the dispatcher getting to 
> the wait. So will more likely block on the sync L2870
> Moving the setConnectionStopped calls out of the sync block along and 
> ensuring that the _rollbackMark is updated before the connection is stopped 
> then we should ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to