[ 
https://issues.apache.org/jira/browse/QPID-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036452#comment-13036452
 ] 

Rajith Attapattu commented on QPID-3259:
----------------------------------------

This is not a deadlock, all though it does give the illusion of one. If you let 
it sit for a while, you will see a timeout like the following.
"Caused by: org.apache.qpid.AMQException: timed out waiting for sync: complete 
= 0, point = 4 [error code 541: internal error]"

(There is also a race condition here which will determine if this quasi 
deadlock happens or not - which I have described under reproducibility).

Root Cause
-------------
This quasi deadlock happens due to the following.
When the queue capacity exceeds the broker closes the session and issues an 
Execution Exception, which results in the client trying to notify it via 
"AMQConnection.exceptionReceived". Inside this method it needs to acquire the 
"FailoverMutex" to proceed.

Meanwhile the test code has invoked session.close() which has already obtained 
the FailoverMutex, and has issued a session close and has called sync() waiting 
for the broker to respond. The broker will not respond bcos that session is 
already closed.

Meanwhile the IO Thread which drives the setting of the exception is blocked 
waiting for the FailoverMutex (in side "AMQConnection.exceptionReceived).

This gives the illusion of a deadlock. Eventually the sync call times out and 
you will see the above error message.


Reproducibility
----------------
As mentioned at the beginning there is a race condition which determines if 
this issue happens or not.
If the execution exception is notified before the session close is invoked this 
will not happen.

Ex. If you put in a delay in the test code (say Thread.sleep(1000) before 
session.close()) you will see that this issue does not arise.


Fix
----
It's not trivial to fix this issue. As mentioned previously the FailoverMutex 
(which I honestly believes is the root cause of all evil) is used to protect 
many operations. Simply rearranging the code to side step the issue could 
result in unexpected behaviour.

Both failover and error handling logic may need some tweaking to fix these 
issues properly. I suspect there maybe more issues like this.

> Deadlock on Java client side while closing session when topic operation is 
> unauthorized
> ---------------------------------------------------------------------------------------
>
>                 Key: QPID-3259
>                 URL: https://issues.apache.org/jira/browse/QPID-3259
>             Project: Qpid
>          Issue Type: Bug
>         Environment: Java client runs into a deadlock when it tries to close 
> session when a topic operation (publish/subscribe) is not authorized.
> In this situation AMQConnection (in exceptionReceived) tries to grab failover 
> mutext and runs into a lock. 
> The other issue in this case is that AMQException.isHardError always returns 
> true and hence the connection tried to close all sessions inside 
> exceptionReceived method. I think there is something wrong here as an 
> unauthorized operation in one session should not lead to closing all other 
> sessions.
>            Reporter: Danushka Menikkumbura
>            Priority: Critical
>         Attachments: QPID-3259-SampleClient, QPID-3259-ThreadDump
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to