From a quick look at the code, it does look like there are some races in ipoib_multicast.c. The place where a QP is actually attached to a group is essentially (trimming debug prints):if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) return 0; ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mcmember.mgid); and the place where a QP is detached is: if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mcmember.mgid);
Going back to 2.6.20 (pre-multicast changes), this area of the code looks like it has the same race. Was IPoIB HA testing done on 2.6.20 or earlier versions of the code, and if so, were any issues found? (I'm not sure we've found all of the problems yet.)
- Sean _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
