On 11/19/2014 11:08 AM, Or Gerlitz wrote:
Since this is a key field, correct handling requires that the group entry
be deleted from the rb table, and then re-inserted with the new key,
so that the table structure is properly maintained.
The current code does not do this correctly. Correct operation
requires that if the key-field gid has changed at all, it should
be deleted and re-inserted, fix that.
Sean, FWIW and even just for the fun, SB some logs from Jack's debugging
that could help you see the problem beyond the nice analysis done by
Jack in the change-log:
Suspected that re-balancing the rb tree confuses the find algorithm.
Verified:
rdma_join_multicast: ENTERING
CMA: ffff88087e0b4c00: cma_join_ib_multicast: cma_join_ib_multicast,
port=1, mgid = ff12:401b:ffff:0000:0000:0000:ffff:ffff
mcast_find. ENTERING. port = 1, node=ffff8804543bf100
ib_sa_get_mcmember_rec: mcast_find SUCCESS. dev=mlx5_0, start_port=1,
port=1, mgid=ff12:401b:ffff:0000:0000:0000:ffff:ffff
mcast_insert.ENTERING. port = 1, *link = ffff8804543bf100
mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 ,
curr group mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff,
*link=ffff8804543bf100, -1
mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 ,
curr group mgid ff12:a01b:fe80:0000:0000:de1e:0000:0000,
*link=ffff88086d74a900, -1
mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 ,
curr group mgid ff12:a01b:fe80:0000:0000:e21e:0000:0000,
*link=ffff88084da54ac0, -1
mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 ,
curr group mgid ff12:a01b:fe80:0000:0000:e41e:0000:0000,
*link=ffff8808976f5dc0, -1
mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 ,
curr group mgid ff12:a01b:fe80:0000:0000:e61e:0000:0000,
*link=ffff880892e44ec0, -1
mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 ,
curr group mgid ff12:a01b:fe80:0000:0000:e71e:0000:0000,
*link=ffff8804607d2ac0, -1
mcast_insert. BEFORE insert color: root rb_node=ffff8804543bf100
mcast_insert. AFTER root rb_node=ffff88086d74a900 <== rb tree
rebalanced (i.e., rotated) here.
mcast_insert traversal. lgroup mgid
0000:0000:0000:0000:0000:0000:0000:0000, node=ffff8807e987aac0
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:e71e:0000:0000, node=ffff8804607d2ac0
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:e61e:0000:0000, node=ffff880892e44ec0
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:e51e:0000:0000, node=ffff8802c5e0d700
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:e41e:0000:0000, node=ffff8808976f5dc0
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:e31e:0000:0000, node=ffff88041a4f4f00
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:e21e:0000:0000, node=ffff88084da54ac0
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:e11e:0000:0000, node=ffff88047244a1c0
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:e01e:0000:0000, node=ffff880897de4dc0
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:df1e:0000:0000, node=ffff8804904cfdc0
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:de1e:0000:0000, node=ffff88086d74a900
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:dd1e:0000:0000, node=ffff88038f5bf300
mcast_insert traversal. lgroup mgid
ff12:a01b:fe80:0000:0000:dc1e:0000:0000, node=ffff88045c023f00
mcast_insert traversal. lgroup mgid
ff12:401b:ffff:0000:0000:0000:0000:0001, node=ffff880451271b00
mcast_insert traversal. lgroup mgid
ff12:401b:ffff:0000:0000:0000:ffff:ffff, node=ffff8804543bf100 <====
Do not find this entry!
mcast_insert traversal. lgroup mgid
ff12:601b:ffff:0000:0000:0000:0000:0001, node=ffff88045c023800
mcast_insert traversal. lgroup mgid
ff12:601b:ffff:0000:0000:0000:0000:0002, node=ffff88046431aac0
mcast_insert traversal. lgroup mgid
ff12:601b:ffff:0000:0000:0000:0000:0016, node=ffff88045c023300
mcast_insert traversal. lgroup mgid
ff12:601b:ffff:0000:0000:0001:ff16:7520, node=ffff88047a04c600
ucma_join_multicast: ENTERING
rdma_join_multicast: ENTERING
CMA: ffff880497488800: cma_join_ib_multicast: cma_join_ib_multicast,
port=1, mgid = ff12:401b:ffff:0000:0000:0000:ffff:ffff
mcast_find. ENTERING. port = 1, node=ffff88086d74a900
mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid
ff12:a01b:fe80:0000:0000:de1e:0000:0000, node=ffff88086d74a900
mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid
ff12:a01b:fe80:0000:0000:e21e:0000:0000, node=ffff88084da54ac0
mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid
ff12:a01b:fe80:0000:0000:e41e:0000:0000, node=ffff8808976f5dc0
mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid
ff12:a01b:fe80:0000:0000:e61e:0000:0000, node=ffff880892e44ec0
mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid
ff12:a01b:fe80:0000:0000:e71e:0000:0000, node=ffff8804607d2ac0
mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid
ff12:a01b:fe80:0000:0000:e81e:0000:0000, node=ffff8807e987aac0
mcast_find FAIL. loop=6, mgid=ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib_sa_get_mcmember_rec: mcast_find FAILED. dev=mlx5_0, start_port=1,
port=1, mgid=ff12:401b:ffff:0000:0000:0000:ffff:ffff
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html