IPoIB: "join finish" occurring just after device was flushed caused Oops.

ipoib_mcast_join_finish() processing could conceivably occur just after
ipoib_mcast_dev_flush() was invoked (in which case the broadcast pointer
is NULL).  This patch tests for and fixes this case.

Signed-off-by: Jack Morgenstein <[EMAIL PROTECTED]>

---

Roland,

We encountered this problem in our regression testing (kernel Oops).
(bugzilla bug 1040). The test randomly causes the HCA physical port to go
down then up. We then have a situation where a "flush" could occur while
IPoIB mcast initialization was still in progress.

Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- ofed_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c     
2008-05-19 15:48:17.000000000 +0300
+++ ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_multicast.c  2008-05-19 
16:07:52.723294000 +0300
@@ -194,7 +194,13 @@ static int ipoib_mcast_join_finish(struc
        /* Set the cached Q_Key before we attach if it's the broadcast group */
        if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
                    sizeof (union ib_gid))) {
+               spin_lock_irq(&priv->lock);
+               if (!priv->broadcast) {
+                       spin_unlock_irq(&priv->lock);
+                       return -EAGAIN;
+               }
                priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
+               spin_unlock_irq(&priv->lock);
                priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
        }
 


-------------------------------------------------------
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to