On Mon, Apr 28, 2008 at 07:14:39PM +0300, Olga Shern (Voltaire) wrote:
> ...
> https://bugs.openfabrics.org/show_bug.cgi?id=985 we will try to reproduce
> it on upstream kernel and let you know
I just saw this bug report today, but we've had similar crashes.
Looks like the problem is that in ipoib_neigh_cleanup() this is
done (no locking):
neigh = *to_ipoib_neigh(n);
then later:
spin_lock_irqsave(&priv->lock, flags);
if (neigh->ah)
ah = neigh->ah;
list_del(&neigh->list); <---- neigh may be stale now
ipoib_neigh_free(n->dev, neigh);
spin_unlock_irqrestore(&priv->lock, flags);
neigh wasn't re-read after acquiring the lock, so it may point
to an already freed data structure.
In our crashes we had backtraces like:
RIP: ib_ipoib:ipoib_neigh_cleanup+368
neigh_destroy+197
neigh_periodic_timer+249
neigh_periodic_timer+0
run_timer_softirq+348
__do_softirq+85
call_softirq+30
do_softirq+44
.....
And the following helpful hint:
Unable to handle kernel paging request at 0000000000100108
^^^^^^^^^^^^^^^^
LIST_POISON1 + 0x8
So we were dying in the midst of list_del().
--
Arthur
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general