Fix a crasher bug in IPoIB CM: once QP is in RTR, an RX completion (and even an
asynchronous error) might be observed on this QP, so we have to initialize all
RX fields beforehand.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

---

> Quoting Woodruff, Robert J <[EMAIL PROTECTED]>:
> Subject: RE: [ofa-general] crash in ipoib
> 
> Sean wrote,
> >> And here's a version with error handling fixed.
> >> Sean, does this solve your crash?
> 
> >We've been running this patch since yesterday and haven't seen any 
> >crashes.  We'll continue testing this over the week-end.
> 
> >- Sean
> 
> This looks like it fixed the panic. 
> 
> Should we try to put out a new RC with this latest ipoib fix ?
> I really think we need it in the release. If we could get another RC out
> today,
> that would only delay the release by a couple of more days and we could
> release on next Friday rather than wed. and still give people a week to 
> test the final RC.
> 
> woody

OK, the following patch has been added to OFED 1.2.
Roland, please consider this bugfix for 2.6.22.

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..c64249f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -309,6 +309,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, 
struct ib_cm_event *even
                return -ENOMEM;
        p->dev = dev;
        p->id = cm_id;
+       cm_id->context = p;
+       p->state = IPOIB_CM_RX_LIVE;
+       p->jiffies = jiffies;
+       INIT_LIST_HEAD(&p->list);
+
        p->qp = ipoib_cm_create_rx_qp(dev, p);
        if (IS_ERR(p->qp)) {
                ret = PTR_ERR(p->qp);
@@ -320,24 +325,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, 
struct ib_cm_event *even
        if (ret)
                goto err_modify;
 
+       spin_lock_irq(&priv->lock);
+       queue_delayed_work(ipoib_workqueue,
+                          &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+       /* Add this entry to passive ids list head, but do not re-add it
+        * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
+       p->jiffies = jiffies;
+       if (p->state == IPOIB_CM_RX_LIVE)
+               list_move(&p->list, &priv->cm.passive_ids);
+       spin_unlock_irq(&priv->lock);
+
        ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
        if (ret) {
                ipoib_warn(priv, "failed to send REP: %d\n", ret);
-               goto err_rep;
+               if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
+                       ipoib_warn(priv, "unable to move qp to error state\n");
        }
-
-       cm_id->context = p;
-       p->jiffies = jiffies;
-       p->state = IPOIB_CM_RX_LIVE;
-       spin_lock_irq(&priv->lock);
-       if (list_empty(&priv->cm.passive_ids))
-               queue_delayed_work(ipoib_workqueue,
-                                  &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
-       list_add(&p->list, &priv->cm.passive_ids);
-       spin_unlock_irq(&priv->lock);
        return 0;
 
-err_rep:
 err_modify:
        ib_destroy_qp(p->qp);
 err_qp:

-- 
MST
_______________________________________________
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Reply via email to