As a recap to this thrilling thread, I'm tracking down a panic with 
a backtrace like this:

     ib_ipoib:ipoib_neigh_cleanup+368
     ....
     neigh_periodic_timer+0
     run_timer_softirq+348
     __do_softirq+85
     call_softirq+30
     do_softirq+44
     .....

And the following helpful hint:
Unable to handle kernel paging request at 0000000000100108
                                          ^^^^^^^^^^^^^^^^
                                          LIST_POISON1+0x8

So, we're in ipoib_neigh_cleanup(), doing the list_del():

static void ipoib_neigh_cleanup(struct neighbour *n)
{
        .......
        neigh = *to_ipoib_neigh(n);
        .....
        spin_lock_irqsave(&priv->lock, flags);
        if (neigh->ah)
                ah = neigh->ah;
        list_del(&neigh->list);
        ipoib_neigh_free(n->dev, neigh);

        spin_unlock_irqrestore(&priv->lock, flags);


This has been practically impossible to reproduce (and I don't 
even have the original crashdump available any longer). 

What would prevent a race between a tx completion (with an 
error) and the cleanup of a neighbour? In that case the tx 
completion handler could do:

ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
{
        ........
        if (wc->status != IB_WC_SUCCESS &&
            wc->status != IB_WC_WR_FLUSH_ERR) {
                .....
                spin_lock_irqsave(&priv->lock, flags);
                neigh = tx->neigh;

                if (neigh) {
                        neigh->cm = NULL;
                        list_del(&neigh->list);
                .......
                spin_unlock_irqrestore(&priv->lock, flags);

While ipoib_neigh_cleanup() could grab the (now stale) neigh, and 
crash like above.

(I've tried simulating tx completion failures to trigger this 
behavior, but haven't gotten lucky yet....)

-- 
Arthur

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to