> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote:
> > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java,
> >  lines 744-745
> > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line744>
> >
> >     What impact does dropping the message on the floor here have on the 
> > client?
> >     
> >     Earlier points in the call hierarchy seem to make effort to do other 
> > things with the message when detecting session/consumer close, so is there 
> > any impact from not doing so? E.g a message getting stuck acquired for a 
> > now-closed consumer?
> >     
> >     Does any particular attention need paid to the overridden 0-10 specific 
> > version of this method?
> 
> rajith attapattu wrote:
>     The message will be dropped if a consumer (or session is closed or 
> closing).
>     When a consumer is closed, any messages acquired but not acknowledged 
> should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?
>     
>     The situation is the same as a consumer closing with a bunch of unacked 
> messages when in CLIENT_ACK mode.
>     Alternatively we could reject the message (release in 0-10 terms). But I 
> don't think this is required, given that we will be closing the consumer 
> anyways.
>     
>     
>     >Earlier points in the call hierarchy seem to make effort to do other 
> things with the message when detecting session/consumer close
>     By "other things" are you referring to a reject? In 0-10 AFIK you don't 
> need to do it. 
>     The situation is the same as a consumer closing with a bunch of unacked 
> messages when in CLIENT_ACK mode.
>     
>     >Does any particular attention need paid to the overridden 0-10 specific 
> version of this method?
>     IMO adding "if (!(isClosed() || isClosing()))" is required to prevent the 
> 0-10 specific method from issuing credit (and receiving more messages, that 
> will eventually get released).
>     There is a chance where the consumer could be marked closed, but not yet 
> reached the point of sending a cancel, by the time messageFlow() is called.
>     The above check should prevent it.
> 
> Robbie Gemmell wrote:
>     "Did you mean race condition (instead of deadlock)?"
>     
>     I really meant deadlock. One of the uses of _messageDeliveryLock being 
> removed was added as part of the fix for a deadlock 
> (https://issues.apache.org/jira/browse/QPID-3911) and so I wonder what effect 
> removing it again will have. It may be the other changes mean it isn't a 
> problem, but I think it needs to be closely considered. 
>     
>     "The message will be dropped if a consumer (or session is closed or 
> closing).
>     When a consumer is closed, any messages acquired but not acknowledged 
> should be be made available to another consumer by the broker.
>     These messages will be marked redelivered.
>     Is this the same situation for 0-8/0-9 ?"
>     
>     That isn't actually the case, transferred messages are the responsibility 
> of the session and remain acquired after consumer close until such point as 
> the session explicitly does something with them. I believe this is true of 
> 0-8/9/91 as well, and probably explains the origin of the reject/release code 
> I referred to earlier in the call tree than the changes would now allow the 
> message to be silently dropped.
>     
>     From the 0-10 spec: "Canceling a subscription MUST NOT affect pending 
> transfers. A transfer made prior to canceling transfers to the destination 
> MUST be able to be accepted, released, acquired, or rejected after the 
> subscription is canceled."
> 
> rajith attapattu wrote:
>     Indeed, Gordon mentioned that to me and the 2nd patch (the one before I 
> did today) takes care of rejecting messages from the consumer. We don't need 
> to do the same when the session is closing, as when the session ends, the any 
> unacked messages are put back on the queue.
> 
> rajith attapattu wrote:
>     As for QPID-3911,
>     There is a deadlock, albeit a bit different (involving the same locks) 
> from QPID-3911, that do happen in similar circumstances.
>     However this deadlock appears to manifest with or without this patch, 
> which leads me to believe that _messageDeliveryLock is not the right solution 
> for QPID-3911.
>     Sadly the solution for QPID-3911 made it worse as there are at least two 
> distinct cases of deadlocks involving _messageDeliveryLock.
>     1. Btw _lock and _messageDeliveryLock
>     2. Btw _messageDeliveryLock and _failoverMutex.
>     
>     We definitely need to find a solution for the deadlocks (at least 3 
> cases) btw failoverMutex and _lock (in AMQSession), which seems to be the 
> root of all evil :)
>     We might have to drop one lock (most likely _lock) and see if we can 
> provide an alternative strategy to guarantee correct behaviour.
>
> 
> rajith attapattu wrote:
>     Sorry I meant dopping _failoverMutex (not _lock in AMQSession).
>     It might also be an opportunity to fix our less than stellar failover.
> 
> Robbie Gemmell wrote:
>     The deadlock itself wasn't really the main issue in QPID-3911, it was 
> just a very obvious symptom that presented after the underlying problem had 
> occurred, which was that the client sent illegal commands to the broker due 
> to allowing things to occur at the same time (/in the wrong order) that it 
> shouldn't had, namely suspending the session(/consumers) during session 
> rollback inside onMessage() at the same time as a consumer close occurring on 
> the session in another thread. The deliveryLock presented a route to stop 
> that by preventing the possibility that the dispatcher would be inside user 
> code at the time and thus make it unable to cause the problem.
>     
>     As you say, while the changes fixed that problem it seems they introduced 
> another. It so far appears to me like that is what using the current changes 
> here would be doing, by returning the client to the state where it could be 
> delivering messages on the session [for another consumer] that could allow 
> rollback while a consumer close is in progress, but with the distinction that 
> we would know in advance this time that there could be a problem with the 
> change.
>     
>     Having a quick look through the client code, it seems like an avenue you 
> could pursue to fix the issue (for 0-10 at least) might be the AMQSession 
> suspensionLock. It is only used inside AMQSession currently, in a way that 
> looks like its acquisition wont be followed by executing user code or 
> acquiring any more locks, and so at first glance looks like it could be safe 
> to acquire during the consumer close as well in order to prevent the 
> concurrent suspend+close that actually caused the problem for 0-10 in 
> QPID-3911. That would only be part of the fix though as it would probably 
> allow a consumer close to block the session rollback only for that to then 
> proceed afterwards and cause the same illegal commands to be sent. To prevent 
> this it seems like ensuring we don't operate on closing/closed consumers in 
> AMQSession_0_10#sendSuspendChannel(..) would stop the client issuing the 
> stop/flow commands for the now-closing/closed consumers. I haven't spent 
> enough time looking at this to know if there are any side effects from doing 
> that though, or if that would fix the related issues seen with 0-8/9/91 which 
> were slightly different and didn't actually involve a deadlock but rather a 
> command timeout, so such things would need investigated if you pursued the 
> approach.

What you describe could happen and obtaining the suspension lock could be a 
potential solution.
The same happens when the broker closes the session while rollback is in 
progress.
In both situations a deadlock happens.

Another potential solution is (which I think we should do anyways) is to check 
if the session is marked closed before we send anything to the broker.
As soon as we enter session close() we should mark the session as being closed.
There are a lot of methods where we just send commands on the wire without 
checking this and it is causing issues.

I will investigate both.


- rajith


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10738/#review20483
-----------------------------------------------------------


On May 21, 2013, 2:48 p.m., rajith attapattu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10738/
> -----------------------------------------------------------
> 
> (Updated May 21, 2013, 2:48 p.m.)
> 
> 
> Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey.
> 
> 
> Description
> -------
> 
> There are at least 3 cases where the deadlock btw _messageDeliveryLock and 
> _failoverMutex surfaces. Among them sending a message inside onMessage() and 
> the session being closed due to an error (causing the deadlock) seems to come 
> up a lot in production environments. There is also a deadlock btw 
> _messageDeliveryLock and _lock (AMQSession.java) which happens less 
> frequently.
> The messageDeliveryLock is used to ensure that we don't close the session in 
> the middle of a message delivery. In order to do this we hold the lock across 
> onMessage().
> This causes several issues in addition to the potential to deadlock. If an 
> onMessage call takes longer/wedged then you cannot close the session or 
> failover will not happen until it returns as the same thread is holding the 
> failoverMutex. 
> 
> Based on an idea from Rafi, I have come up with a solution to get rid of 
> _messageDeliveryLock and instead use an alternative strategy to achieve 
> similar functionality.
> In order to ensure that close() doesn't proceed until the message deliveries 
> currently in progress completes, an atomic counter is used to keep track of 
> message deliveries in progress.
> The close() will wait until the count falls to zero before proceeding. No new 
> deliveries will be initiated bcos the close method will mark the session as 
> closed.
> The wait has a timeout to ensure that a longer running or wedged onMessage() 
> will not hold up session close.
> There is a slim chance that before a session being marked as closed a message 
> delivery could be initiated, but not yet gotten to the point of updating the 
> counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed 
> with close. But in comparison to the issues with _messageDeliveryLock, I 
> believe it's acceptable.
> 
> There is an issue if MessageConsumer close is called outside of Session 
> close. This can be solved in a similar manner. I will wait until the current 
> review is complete and then post the solution for the MessageConsumer close.
> I will commit them both together.
> 
> 
> This addresses bug QPID-4574.
>     https://issues.apache.org/jira/browse/QPID-4574
> 
> 
> Diffs
> -----
> 
>   
> http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java
>  1484812 
>   
> http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java
>  1484812 
>   
> http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java
>  1484812 
>   
> http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java
>  1484812 
>   
> http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/util/ConditionManager.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10738/diff/
> 
> 
> Testing
> -------
> 
> Java test suite, tests from customers and QE around the deadlock situation.
> 
> 
> Thanks,
> 
> rajith attapattu
> 
>

Reply via email to