[jira] Updated: (QPID-1871) During Rollback Client Rejects Message after sending TxRollback

Martin Ritchie (JIRA) Fri, 22 May 2009 03:27:10 -0700

     [ 
https://issues.apache.org/jira/browse/QPID-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Martin Ritchie updated QPID-1871:
---------------------------------

    Description: 
Summary:
See QPID-1864 for annotated log output.

The log output is from a run with the Java broker, but highlights that the 
client dispatcher thread is not synchronized with the main thread during 
rollback.
As a result the main thread sends the TxRollback before the Dispatcher has sent 
its Reject message. This results, on the java broker, in the unrejected message 
being redelivered, which may be out of order depending on what other messages 
have been released on the message queue.

If we are to continue to rely on the dispatcher thread rejecting/releasing the 
message it is currently processing (i.e. the message that is neither in the 
_queue preDispatchQueue nor the _synchronousQueue for receiver delivery) then 
we will need to synchronize with the main thread's rollback/recover calls so 
that the dispatcher can finish processing its message before the 
rollback/recover completes.

The message that the Dispatcher thread has can be seen in AMQSession 
L:2866:dispatchMessage(). 
On Rollback we stop the dispatcher (L2763) which can result in the dispatcher 
thread stopping on L2877 and holding on the the message it is in the middle of 
delivery. More likely during recover the dispatcher will block on the lock 
L2870.

When the dispatcher is restarted (L2792) it is then free to reject its message. 
However, the thread that restarted the dispatcher's next call is to send the 
rollback command(L1553) Which is where the race condition occurs.

Potential Fix:
Message Rejection should be performed BEFORE we stop the dispatcher.
On L:2825 we remove the message from the _queue (preDispatchQueue) and then 
potentailly sit on the message L:2877 when we get stopped.

If the reject call in L:2888 was before the wait then we could reject the 
message rather than sit on it.

Note: Now that I look at this a bit more the rollback (L2754) code looks to be 
over synchronized. I'm not sure the dispatcher will actually ever stop on the 
wait L2877 during rollback as the dispatcher is stopped and started again 
inside the one syncronisation which would prevent the dispatcher getting to the 
wait. So will more likely block on the sync L2870
Moving the setConnectionStopped calls out of the sync block along and ensuring 
that the _rollbackMark is updated before the connection is stopped then we 
should ok.

  was:
Summary:
See QPID-1864 for annotated log output.

The log output is from a run with the Java broker, but highlights that the 
client dispatcher thread is not synchronized with the main thread during 
rollback.
As a result the main thread sends the TxRollback before the Dispatcher has sent 
its Reject message. This results, on the java broker, of the unrejected message 
being redelivered, which may be out of order depending on what other messages 
have been released on the message queue.

If we are to continue to rely on the dispatcher thread rejecting/releasing the 
message it is currently processing (i.e. the message that is neither in the 
_queue preDispatchQueue nor the _synchronousQueue for receiver delivery) then 
we will need to synchronize with the main thread's rollback/recover calls so 
that the dispatcher can finish processing its message before the 
rollback/recover completes.

The message that the Dispatcher thread has can be seen in AMQSession 
L:2866:dispatchMessage(). 
On Rollback we stop the dispatcher (L2763) which can result in the dispatcher 
thread stopping on L2877 and holding on the the message it is in the middle of 
delivery.

When the dispatcher is restarted (L2792) it is then free to reject its message. 
However, the thread that restarted the dispatcher's next call is to send the 
rollback command(L1553) Which is where the race condition occurs.

Potential Fix:
Message Rejection should be performed BEFORE we stop the dispatcher.
On L:2825 we remove the message from the _queue (preDispatchQueue) and then 
potentailly sit on the message L:2877 when we get stopped.

If the reject call in L:2888 was before the wait then we could reject the 
message rather than sit on it.

Note: Now that I look at this a bit more the rollback (L2754) code looks to be 
over synchronized. I'm not sure the dispatcher will actually ever stop on the 
wait L2877 during rollback as the dispatcher is stopped and started again 
inside the one syncronisation which would prevent the dispatcher getting to the 
wait. 
Moving the setConnectionStopped calls out of the sync block along and ensuring 
that the _rollbackMark is updated before the connection is stopped then we 
should ok.


> During Rollback Client Rejects Message after sending TxRollback
> ---------------------------------------------------------------
>
>                 Key: QPID-1871
>                 URL: https://issues.apache.org/jira/browse/QPID-1871
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker, Java Client
>    Affects Versions: M4, 0.5
>            Reporter: Martin Ritchie
>
> Summary:
> See QPID-1864 for annotated log output.
> The log output is from a run with the Java broker, but highlights that the 
> client dispatcher thread is not synchronized with the main thread during 
> rollback.
> As a result the main thread sends the TxRollback before the Dispatcher has 
> sent its Reject message. This results, on the java broker, in the unrejected 
> message being redelivered, which may be out of order depending on what other 
> messages have been released on the message queue.
> If we are to continue to rely on the dispatcher thread rejecting/releasing 
> the message it is currently processing (i.e. the message that is neither in 
> the _queue preDispatchQueue nor the _synchronousQueue for receiver delivery) 
> then we will need to synchronize with the main thread's rollback/recover 
> calls so that the dispatcher can finish processing its message before the 
> rollback/recover completes.
> The message that the Dispatcher thread has can be seen in AMQSession 
> L:2866:dispatchMessage(). 
> On Rollback we stop the dispatcher (L2763) which can result in the dispatcher 
> thread stopping on L2877 and holding on the the message it is in the middle 
> of delivery. More likely during recover the dispatcher will block on the lock 
> L2870.
> When the dispatcher is restarted (L2792) it is then free to reject its 
> message. However, the thread that restarted the dispatcher's next call is to 
> send the rollback command(L1553) Which is where the race condition occurs.
> Potential Fix:
> Message Rejection should be performed BEFORE we stop the dispatcher.
> On L:2825 we remove the message from the _queue (preDispatchQueue) and then 
> potentailly sit on the message L:2877 when we get stopped.
> If the reject call in L:2888 was before the wait then we could reject the 
> message rather than sit on it.
> Note: Now that I look at this a bit more the rollback (L2754) code looks to 
> be over synchronized. I'm not sure the dispatcher will actually ever stop on 
> the wait L2877 during rollback as the dispatcher is stopped and started again 
> inside the one syncronisation which would prevent the dispatcher getting to 
> the wait. So will more likely block on the sync L2870
> Moving the setConnectionStopped calls out of the sync block along and 
> ensuring that the _rollbackMark is updated before the connection is stopped 
> then we should ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

[jira] Updated: (QPID-1871) During Rollback Client Rejects Message after sending TxRollback

Reply via email to