A new interface was added to the core workqueue API to make handling
cancel_delayed_work() deadlocks easier, so a simpler fix for bug 13757
as below becomes possible.  Bart, it would be great if you could retest
this, since it is what I am planning on sending upstream for 2.6.31.
(This patch depends on 4e49627b, "workqueues: introduce
__cancel_delayed_work()", which was merged for 2.6.31-rc9; alternatively
my for-next branch is now rebased on top of -rc9 and has this patch plus
everything else queued for 2.6.32).

Thanks,
  Roland


Lockdep reported a possible deadlock with cm_id_priv->lock,
mad_agent_priv->lock and mad_agent_priv->timed_work.timer; this
happens because the mad module does

        cancel_delayed_work(&mad_agent_priv->timed_work);

while holding mad_agent_priv->lock.  cancel_delayed_work() internally
does del_timer_sync(&mad_agent_priv->timed_work.timer).

This can turn into a deadlock because mad_agent_priv->lock is taken
inside cm_id_priv->lock, so we can get the following set of contexts
that deadlock each other:

 A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
 B: holding mad_agent_priv->lock, waiting for del_timer_sync()
 C: interrupt during mad_agent_priv->timed_work.timer that takes
    cm_id_priv->lock

Fix this by using the new __cancel_delayed_work() interface (which
internally does del_timer() instead of del_timer_sync()) in all the
places where we are holding a lock.

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanass...@gmail.com>
Signed-off-by: Roland Dreier <rola...@cisco.com>
---
 drivers/infiniband/core/mad.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index de922a0..bc30c00 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1974,7 +1974,7 @@ static void adjust_timeout(struct ib_mad_agent_private 
*mad_agent_priv)
        unsigned long delay;
 
        if (list_empty(&mad_agent_priv->wait_list)) {
-               cancel_delayed_work(&mad_agent_priv->timed_work);
+               __cancel_delayed_work(&mad_agent_priv->timed_work);
        } else {
                mad_send_wr = list_entry(mad_agent_priv->wait_list.next,
                                         struct ib_mad_send_wr_private,
@@ -1983,7 +1983,7 @@ static void adjust_timeout(struct ib_mad_agent_private 
*mad_agent_priv)
                if (time_after(mad_agent_priv->timeout,
                               mad_send_wr->timeout)) {
                        mad_agent_priv->timeout = mad_send_wr->timeout;
-                       cancel_delayed_work(&mad_agent_priv->timed_work);
+                       __cancel_delayed_work(&mad_agent_priv->timed_work);
                        delay = mad_send_wr->timeout - jiffies;
                        if ((long)delay <= 0)
                                delay = 1;
@@ -2023,7 +2023,7 @@ static void wait_for_response(struct 
ib_mad_send_wr_private *mad_send_wr)
 
        /* Reschedule a work item if we have a shorter timeout */
        if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list) {
-               cancel_delayed_work(&mad_agent_priv->timed_work);
+               __cancel_delayed_work(&mad_agent_priv->timed_work);
                queue_delayed_work(mad_agent_priv->qp_info->port_priv->wq,
                                   &mad_agent_priv->timed_work, delay);
        }
-- 
1.6.4

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to