On 11/19/2014 11:08 AM, Or Gerlitz wrote:
Since this is a key field, correct handling requires that the group entry
be deleted from the rb table, and then re-inserted with the new key,
so that the table structure is properly maintained.

The current code does not do this correctly. Correct operation
requires that if the key-field gid has changed at all, it should
be deleted and re-inserted, fix that.

Sean, FWIW and even just for the fun, SB some logs from Jack's debugging that could help you see the problem beyond the nice analysis done by Jack in the change-log:

Suspected that re-balancing the rb tree confuses the find algorithm. Verified:
rdma_join_multicast: ENTERING
CMA: ffff88087e0b4c00: cma_join_ib_multicast: cma_join_ib_multicast, port=1, mgid = ff12:401b:ffff:0000:0000:0000:ffff:ffff
mcast_find. ENTERING. port = 1, node=ffff8804543bf100
ib_sa_get_mcmember_rec: mcast_find SUCCESS. dev=mlx5_0, start_port=1, port=1, mgid=ff12:401b:ffff:0000:0000:0000:ffff:ffff
mcast_insert.ENTERING. port = 1, *link = ffff8804543bf100
mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 , curr group mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff, *link=ffff8804543bf100, -1 mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 , curr group mgid ff12:a01b:fe80:0000:0000:de1e:0000:0000, *link=ffff88086d74a900, -1 mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 , curr group mgid ff12:a01b:fe80:0000:0000:e21e:0000:0000, *link=ffff88084da54ac0, -1 mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 , curr group mgid ff12:a01b:fe80:0000:0000:e41e:0000:0000, *link=ffff8808976f5dc0, -1 mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 , curr group mgid ff12:a01b:fe80:0000:0000:e61e:0000:0000, *link=ffff880892e44ec0, -1 mcast_insert. new grp mgid 0000:0000:0000:0000:0000:0000:0000:0000 , curr group mgid ff12:a01b:fe80:0000:0000:e71e:0000:0000, *link=ffff8804607d2ac0, -1

mcast_insert. BEFORE insert color: root rb_node=ffff8804543bf100
mcast_insert. AFTER root rb_node=ffff88086d74a900 <== rb tree rebalanced (i.e., rotated) here.

mcast_insert traversal. lgroup mgid 0000:0000:0000:0000:0000:0000:0000:0000, node=ffff8807e987aac0 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:e71e:0000:0000, node=ffff8804607d2ac0 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:e61e:0000:0000, node=ffff880892e44ec0 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:e51e:0000:0000, node=ffff8802c5e0d700 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:e41e:0000:0000, node=ffff8808976f5dc0 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:e31e:0000:0000, node=ffff88041a4f4f00 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:e21e:0000:0000, node=ffff88084da54ac0 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:e11e:0000:0000, node=ffff88047244a1c0 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:e01e:0000:0000, node=ffff880897de4dc0 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:df1e:0000:0000, node=ffff8804904cfdc0 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:de1e:0000:0000, node=ffff88086d74a900 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:dd1e:0000:0000, node=ffff88038f5bf300 mcast_insert traversal. lgroup mgid ff12:a01b:fe80:0000:0000:dc1e:0000:0000, node=ffff88045c023f00 mcast_insert traversal. lgroup mgid ff12:401b:ffff:0000:0000:0000:0000:0001, node=ffff880451271b00 mcast_insert traversal. lgroup mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff, node=ffff8804543bf100 <==== Do not find this entry! mcast_insert traversal. lgroup mgid ff12:601b:ffff:0000:0000:0000:0000:0001, node=ffff88045c023800 mcast_insert traversal. lgroup mgid ff12:601b:ffff:0000:0000:0000:0000:0002, node=ffff88046431aac0 mcast_insert traversal. lgroup mgid ff12:601b:ffff:0000:0000:0000:0000:0016, node=ffff88045c023300 mcast_insert traversal. lgroup mgid ff12:601b:ffff:0000:0000:0001:ff16:7520, node=ffff88047a04c600
ucma_join_multicast: ENTERING
rdma_join_multicast: ENTERING
CMA: ffff880497488800: cma_join_ib_multicast: cma_join_ib_multicast, port=1, mgid = ff12:401b:ffff:0000:0000:0000:ffff:ffff
mcast_find. ENTERING. port = 1, node=ffff88086d74a900
mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid ff12:a01b:fe80:0000:0000:de1e:0000:0000, node=ffff88086d74a900 mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid ff12:a01b:fe80:0000:0000:e21e:0000:0000, node=ffff88084da54ac0 mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid ff12:a01b:fe80:0000:0000:e41e:0000:0000, node=ffff8808976f5dc0 mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid ff12:a01b:fe80:0000:0000:e61e:0000:0000, node=ffff880892e44ec0 mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid ff12:a01b:fe80:0000:0000:e71e:0000:0000, node=ffff8804607d2ac0 mcast_find. mgid ff12:401b:ffff:0000:0000:0000:ffff:ffff != group gid ff12:a01b:fe80:0000:0000:e81e:0000:0000, node=ffff8807e987aac0
mcast_find FAIL. loop=6, mgid=ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib_sa_get_mcmember_rec: mcast_find FAILED. dev=mlx5_0, start_port=1, port=1, mgid=ff12:401b:ffff:0000:0000:0000:ffff:ffff


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to