Ms. Megan Larko wrote: > I changed the IB cable on the problem box using the same IB card, PCI > slot and slot on the IB SilverStorm switch. > The errors I now see on the clients are the same but the server OSS > for crew8-OST0000 thru crew8-OST0011 are: > ib0: multicast join failed for > ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22 > LustreError: 4346:0:(filter.c:2674:filter_destroy_precreated()) > LustreError:4486:0:(ldmlm_lib.c:1442:target_send_reply_msg(()@@processing > error -107 > > Perhaps it could be the IB card? It is a Mellanox Technologies > MT25204 [InfiniHost III Lx HCA]. This is the same card in many, but > not all, of our other systems. I can try a new IB card on Monday. >
Which subnet manager are you using? You should look a the log files to see why you are getting these "multicast join failed" messages - which are indications that there is something pretty wrong with the infiniband fabric. For some reason (like the nodes do not support the speed used for the multicast group), they could not join the group. This is especially critical as this particular multicast group is used for all IPv4 broadcast traffic (eg, IPv4 ARP requests). Since infiniband multicast is not well understood, let me summarize: The SM assigns a multicast LID for each multicast group. Most switches only support 1024 multicast LIDs, and some SMs cannot map more than one group to the same LID, so multicast sometimes breaks when you get too many groups (ie, more than ~900 nodes with just link-local IPv6 addresses - see below). When a node first joins a multicast group, it selects the group speed (typically SDR 4x or DDR 4x). Nodes that do not support (at least) that speed are not allowed to join later, as all multicast messages for that LID are sent at that speed (ie, an SDR node cannot joing a DDR mcast group, as it could not keep up). With IPv6, ARPs are done using multicast (which is perfect for broadcast LANs, where only the target node takes an interrupt to process the ARP request), which can lead to a multicast group being created per IPv6 address. Note that IPv4 also uses a few multicast groups. With infiniband, it is a little messy, where the node has to query the MC list from the SM to know the LID to use to send the multicast ARP. Try checking the link speeds, and looking at "saquery -g" Kevin Van Maren _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
