> I just saw this bug report today, but we've had similar crashes. > Looks like the problem is that in ipoib_neigh_cleanup() this is > done (no locking): > > neigh = *to_ipoib_neigh(n); > > then later: > > spin_lock_irqsave(&priv->lock, flags); > if (neigh->ah) > ah = neigh->ah; > list_del(&neigh->list); <---- neigh may be stale now > ipoib_neigh_free(n->dev, neigh); > spin_unlock_irqrestore(&priv->lock, flags); > > neigh wasn't re-read after acquiring the lock, so it may point > to an already freed data structure.
Ugh, looks delicate to fix properly, since we don't have a lock to take until we find out whether the neighbour is attached to an IPoIB device. > Unable to handle kernel paging request at 0000000000100108 > ^^^^^^^^^^^^^^^^ > LIST_POISON1 + 0x8 strange that the ofa bugzilla entry has a different address it's crashing at. _______________________________________________ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg