On Thu, 2008-06-12 at 15:58 +0300, Yevgeny Kliteynik wrote: > Hal Rosenstock wrote: > > On Thu, 2008-06-12 at 14:08 +0300, Olga Shern (Voltaire) wrote: > >> > >> On 6/12/08, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > >> Hi Olga, > >> > >> On Thu, 2008-06-12 at 09:46 +0300, Olga Shern wrote: > >> > Hi All, > >> > > >> > > >> > > >> > We have found something that seems like Infiniband Spec > >> hole, > >> > >> What's the spec hole ? > >> > >> According to the Infiniband spec - partial member cannot "talk" with > >> partial member only with full member. > >> Therefore if partial member sending MC packet - all other partial > >> members of this partition will generate BAD PKEY trap. > >> It means that the behavior that we see is according to Infiniband > >> Spec - but very problematic > > > > Originally, multicast groups were all full member only and more recently > > was this extended to allow partial members and this was missed. A > > comment should be filed against the spec on this. > > > >> > This issue is system issue that prevents from partial P_Key > >> setup to > >> > go into production. > >> > >> Indeed :-( > >> > >> > Short Setup & test description: > >> > ------------------------------------------ > >> > * Node A: P_Key XXX (full member) > >> > * Node B, C, D, E, F: P_Key XXx (partial member) > >> > > >> > 1. Send ping from B -> A : ping is OK > >> > 2. Send ping from C -> A : ping is OK > >> > 3. Send ping from B -> C : no ping also OK > >> > * Get traps Bad P_Key in SM - from all HCA in the fabric > >> both for > >> > test 1 & 2 (one time) and also for test 3 (all the time). > > > > What does all the time mean ? Does this mean with one test 3 ping, the > > traps are repeated ? If so, at what rate ? > > Also, why do the HCAs issue these traps? Is the pkey enforcement > on switch external ports is off?
I presume so but there was a claim about what would happen if ingress/egress filtering were on (about getting the switch rather than end port bad PKey traps). > AFAIK, by default, OpenSM should > configure pkeys on switch ports that are connected to these HCAs, > so that partial member wouldn't get packet from another partial > member. It was done using VSM not OpenSM. -- Hal > -- Yevgeny > > >> > Probably the ARP request that is MC traffic generate the > >> trap in HCA, > >> > for test 1 > >> > & 2 we have only one ARP but for test 3 we send ARP all the > >> time > >> > because > >> > we do not get any ARP reply. > >> > > >> > * The trap number SM get is 257 (HCA trap) if we will do > >> P_Key > >> > switch enforcement we will probably get 259 > >> > >> Is this with OpenSM or VSM ? > >> > >> We tested it with Voltaire SM but it should behave the same with > >> OpenSM. > > > > That's likely but I'm not sure yet. > > > >> -- Hal > >> > >> > * We get trap also from the originator of the MC traffic > >> even > >> > though that receive switch relay error counter is increased > >> (when out > >> > port==in port), the switch does not drop the packet ? > > > > The implementation of that counter is broken and occurs "normally". The > > increment of this counter is relatively meaningless :-( > > > >> > Additional questions/issues: > >> > * Do we have a way to suppress port traps from SMA ?? i.e. > >> that > >> > the port will not generate traps that can "kill the SM" - as > >> its look > >> > this is bug in the spec where we can't send any mc traffic > >> (even ARP) > >> > when we have partial members and we do not have a way to > >> suppress the > >> > traps. > > > > All the SM can do is TrapRepress. > > > >> > * What will happen in the HCA when we get many traps (mc > >> packets > >> > from many nodes) and they need to keep all events until SM > >> will > >> > acknowledge? - Is there limitation in the number of on- > >> going > >> > traps (any HCA specific issues)? > > > > Assuming you mean events from which traps are generated, I think this is > > left as an implementation dependent detail in terms of the spec. An > > implementation needs to take care not to lose certain events; others > > like this aren't critical but that's left to the specific SMA > > implementation. > > > > -- Hal > > > >> > > >> > > >> > > >> > > >> > Best Regards > >> > > >> > Olga > >> > > >> > > >> > _______________________________________________ > >> > general mailing list > >> > [email protected] > >> > http://lists.openfabrics.org/cgi- > >> bin/mailman/listinfo/general > >> > > >> > To unsubscribe, please visit > >> http://openib.org/mailman/listinfo/openib-general > >> > >> _______________________________________________ > >> general mailing list > >> [email protected] > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > >> To unsubscribe, please visit > >> http://openib.org/mailman/listinfo/openib-general > >> > > > > _______________________________________________ > > general mailing list > > [email protected] > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
