> If I am not mistaken the issue you mention is a little different from the one > I pointed out. > Without bonding I see the following: > kernel: ib0: multicast join failed for > ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 > However, with bonding what I see is : > ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, > status -22
Please note that -11 EAGAIN (try again) is and -22 is EINVAL (invalid argument). So you can get EAGAIN when the underlying core sa agent is not ready to send SA queries, while you get EINVAL when attempting to join on a junk MGID. I am confident that for long time we see joins on junk MGIDs and it has been reported on this list (google...) in the past, no resolution yet. Under bonding there might be a window is time where from the kernel network stack perspective the bonding device ether-type is ethernet and not infiniband and hence the wrong (ip_eth_mc_map instead of ip_ib_mc_map) function would be called to do the mapping from the IP multicast address to the HW multicast address > Subsequently an ib-bond status does not reveal any slave as active as shown > below: > ib-bond --status > bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:00:00:03:05:b9 > slave0: ib0 > slave1: ib1 As this script is not standard and deprecated, I would recommend not to use it but rather the classic /proc/net/bonding/bond0 entry, along with ip addr show on bond0, ib0, ib1 Or. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
