IPoIB: "join finish" occurring just after device was flushed caused Oops.
ipoib_mcast_join_finish() processing could conceivably occur just after
ipoib_mcast_dev_flush() was invoked (in which case the broadcast pointer
is NULL). This patch tests for and fixes this case.
Signed-off-by: Jack Morgenstein <[EMAIL PROTECTED]>
---
Roland,
We encountered this problem in our regression testing (kernel Oops).
(bugzilla bug 1040). The test randomly causes the HCA physical port to go
down then up. We then have a situation where a "flush" could occur while
IPoIB mcast initialization was still in progress.
Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- ofed_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
2008-05-19 15:48:17.000000000 +0300
+++ ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-05-19
16:07:52.723294000 +0300
@@ -194,7 +194,13 @@ static int ipoib_mcast_join_finish(struc
/* Set the cached Q_Key before we attach if it's the broadcast group */
if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid))) {
+ spin_lock_irq(&priv->lock);
+ if (!priv->broadcast) {
+ spin_unlock_irq(&priv->lock);
+ return -EAGAIN;
+ }
priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
+ spin_unlock_irq(&priv->lock);
priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
}
-------------------------------------------------------
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general