In the recent hangs, the process that is triggering the hang is using the umad 
interface to query path records. Since we usually discover this problem long 
after the onset, I'm not sure if there are actual queries outstanding when the 
problem occurs.

-----Original Message-----
From: Sean Hefty [mailto:[email protected]] 
Sent: Monday, May 03, 2010 2:40 PM
To: Mike Heinz; Roland Dreier
Cc: LINUX-RDMA
Subject: RE: Hang in ib_umad when attempting to unregister.

>I should be more clear - there are a couple of reasons why I don't think
>Roland's patch is the cause, or a fix, for this problem. First, because when I
>dug through QLogic's bug database I found incidents like this going back to
>2007. Second, when I first began looking at this I noticed the patch and built
>a version that moved the cancel_delayed_work() calls in ib_cancel_rmpp_recvs()
>back inside the locked area and the problem still occurred.
>
>Finally, I should note that this isn't a spinlock type hang; what's happening
>is that destroy_rmpp_recv() appears to be sleeping, waiting for a completion
>that never arrives. I'm guessing that what is going on is that the reference
>count in an rmpp_recv is wrong, but what is causing the problem is unknown.

What RMPP messages were being sent/received?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to