On Wed, Nov 12, 2008 at 5:18 PM, <[EMAIL PROTECTED]> wrote: > > Here's a description of a problem we're seeing where multicast > forwarding tables are apparently getting set up incorrectly. I'd > appreciate any debug help from the opensm experts out there. > > On large clusters (>1000 nodes or so) we often see hundreds of errors > from 'ibdiagnet -r' like the following (this is the simplest example > I could find): > > -I- Multicast Group:0xC069 has:2 switches and:2 HCAs > -E- Disconnected switch:S0800690000002e51/U1 in group:0xC069 > -E- Disconnected HCA:r4i2n10/U1 > > These have invariably been multicast groups associated with IPv6 > solicited node multicast addresses, e.g., in this case 'saquery -m' > shows only a single member, "r5lead": > > MCMemberRecord member dump: > MGID....................0xff12601bffff0000 : 0x00000001ff26d289 > Mlid....................0xC069 > PortGid.................0xfe80000000000000 : 0x0002c9020026d289 > ScopeState..............0x1 > ProxyJoin...............0x0 > NodeDescription.........r5lead HCA-1 > > ibdiagnet shows that "r5lead" is connected to the switch with lid > 1609, port 24: > > Switch 24 "S-0800690000002db4" # "MT47396 Infiniscale-III Mellanox > Technologies" base port 0 lid 1609 lmc 0 > [24] "H-0002c9020026d288"[1](2c9020026d289) # "r5lead HCA-1" lid > 1576 4xDDR > > and the multicast forwarding table (from 'dump_mfts.sh') is consistent: > > Multicast mlids [0xc000-0xc3ff] of switch Lid 1609 guid 0x0800690000002db4 > (MT47396 Infiniscale-III Mellanox Technologies): > 0 1 2 > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > MLid > .... > 0xc069 x > > > So far, so good. But we also have r4i2n10, connected to the switch with > lid 1533 port 7: > > switchguid=0x800690000002e50(800690000002e50) > Switch 24 "S-0800690000002e50" # "MT47396 Infiniscale-III Mellanox > Technologies" base port 0 lid 1533 lmc 0 > ...... > [7] "H-003048c2438a0000"[1](3048c2438a0001) # "r4i2n10 > HCA-1" lid 771 4xDDR > > with this mft entry: > > Multicast mlids [0xc000-0xc3ff] of switch Lid 1533 guid 0x0800690000002e50 > (MT47396 Infiniscale-III Mellanox Technologies): > 0 1 2 > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > MLid > ..... > 0xc069 x > > Any idea why "r4i2n10", with PortGid fe80::3048c2438a0001 would have a > mft entry for the multicast group with MGID ff12601bffff::1ff26d289?
Are you using the consolidate IPv6 SNM (solicited node multicast) option in OpenSM ? -- Hal > Anyone else seen similar? > > -- > Arthur > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
