Ipoib can miss a change in dgid under some conditions.  The problem is
caused when ipoib_neigh->dgid contains a stale address.  The fix is to
set ipoib_neigh->dgid to zero in ipoib_neigh_alloc().

Detail description: A systems using bonding on its ipoib interface has
switched it active slave interface from interface A to B and back to A
setting up the situation for this bug.  The system that fails will not
correctly processes the 2nd address change.

When an address has changed neighbor->ha is updated with the new address.
Each neighbor has an associated ipoib_neigh.  ipoib_neigh->dgid also
holds a copy of the remote node's hardware address.  When an address
changes neighbor->ha is updated by the network layer (arp code) with the
new address.  Ipoib detects this change in ipoib_start_xmit() by comparing
neighbor->ha with ipoib_neigh->dgid.  The bug is that ipoib_neigh->dgid
already contains the new address(A) thus the change from B to A is missed
by ipoib.  Here is the sequence of events:

ipoib_neigh->dgid = A neighbor->ha=A

The address is switched to B (the first switch)

neighbor->ha=B

The change is seen in ipoib_start_xmit(). neighbor->ha !=
ipoib_neigh->dgid

The ipoib_neigh is released, and a new one is allocated.

The memory allocation system returned the same chunk of memory that was
just released, therefore ipoib_neigh->dgid still contains A at this point.

ipoib_neigh->dgid should be updated in neigh_add_path(), but if the
following conditions are true dgid is not updated.

        1) __path_find() returns a path

        2) path->ah is NULL

The remote system now switches from address B to A, neighbor->ha is
updated to A.

Now we have: ipoib_neigh->dgid = A neighbor->ha=A

Since the address are the same ipoib won't process the change in address.

Signed-off-by: David Wilder <[email protected]>

------------------------------------------------------
 drivers/infiniband/ulp/ipoib/ipoib_main.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 2bf5116..25ef50b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -884,6 +884,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour 
*neighbour,
 
        neigh->neighbour = neighbour;
        neigh->dev = dev;
+       memset(&neigh->dgid.raw, 0, sizeof(union ib_gid));
        *to_ipoib_neigh(neighbour) = neigh;
        skb_queue_head_init(&neigh->queue);
        ipoib_cm_set(neigh, NULL);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to