On Wed, Feb 1, 2012 at 12:22 PM, David Miller <[email protected]> wrote: > > Any time an ipoib_neigh is changed, a sequence like the following is made: > > spin_lock_irqsave(&priv->lock, flags); > /* > * It's safe to call ipoib_put_ah() inside > * priv->lock here, because we know that > * path->ah will always hold one more reference, > * so ipoib_put_ah() will never do more than > * decrement the ref count. > */ > if (neigh->ah) > ipoib_put_ah(neigh->ah); > list_del(&neigh->list); > ipoib_neigh_free(dev, neigh); > spin_unlock_irqrestore(&priv->lock, flags); > ipoib_path_lookup(skb, n, dev); > > This doesn't work, because you're leaving a stale pointer to the freed up > ipoib_neigh in the special neigh->ha pointer cookie. Yes, it even fails > with all the locking done to protect _changes_ to *ipoib_neigh(n), and > with the code in ipoib_neigh_free() that NULLs out the pointer. > > The core issue is that read side calls to *to_ipoib_neigh(n) are not > being synchronized at all, they are performed without any locking. So > whether we hold the lock or not when making changes to *ipoib_neigh(n) > you still can have threads see references to freed up ipoib_neigh > objects. > > cpu 1 cpu 2 > n = *ipoib_neigh() > *ipoib_neigh() = NULL > kfree(n) > n->foo == OOPS > > To be honest I wouldn't mind if ipoib managed it's neighbour entries > differently. The way it works now is getting in the way of some > changes I want to make. Namely that I want to make it so that > dst_entry objects do not have an attached neighbour, instead > neighbours are always looked up on demand. > > Such on-demand neigh lookups require having access to the ipv4/ipv6 > header destination address of the packet, and in these ipoib paths we > don't have a real obvious way to get at that without lots of kludgy > tests. > > Sticking datastructure pointers into the hardware address of the neigh > is not pretty either :-) > > Perhaps the ipoib code can have a private path database it manages > entirely itself, which holds all the necessary information and is > looked up by some generic key which is available easily at transmit > time and does not involve generic neighbour entries.
Thanks David. This is probably the crux of some rare crashes we see in the ipoib driver around handling the neighbour database. I've made a few half-hearted stabs at fixing this up but I never really got past the impedence mismatch between the networking core and the strange requirements of the IPoIB neighbour handling. The problem is that we definitely want to use the core networking stack to resolve neighbours, since we clearly don't want a private implementation of ARP and IPv6 ND. But what IPoIB abstractly calls a "hardware address" is actually an IB GID, which does uniquely identify a destination but is not actually what we need to send a packet. So we do have the whole path database inside the ipoib driver, and in fact for things like unicast ARPs (which comes without a neighbour attached) we do lookup the hardware address to an internal ipoib path. The problem is that we don't want to do the expensive lookup of a full 16-byte GID to path for real datapath packets where we do have a neighbour resolved. So we try to do the lookup once for each network stack neighbour. Perhaps we could do some RCU-ish thing where we don't actually free the ipoib_neigh immediately? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
