On 6/12/08, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > On Thu, 2008-06-12 at 16:31 +0300, Olga Shern (Voltaire) wrote: > > > > > > On 6/12/08, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > On Thu, 2008-06-12 at 14:08 +0300, Olga Shern (Voltaire) > > wrote: > > > > > > > > > On 6/12/08, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > Hi Olga, > > > > > > On Thu, 2008-06-12 at 09:46 +0300, Olga Shern wrote: > > > > Hi All, > > > > > > > > > > > > > > > > We have found something that seems like Infiniband > > Spec > > > hole, > > > > > > What's the spec hole ? > > > > > > According to the Infiniband spec - partial member cannot > > "talk" with > > > partial member only with full member. > > > Therefore if partial member sending MC packet - all other > > partial > > > members of this partition will generate BAD PKEY trap. > > > It means that the behavior that we see is according to > > Infiniband > > > Spec - but very problematic > > > > Originally, multicast groups were all full member only and > > more recently > > was this extended to allow partial members and this was > > missed. A > > comment should be filed against the spec on this. > > > > > > This issue is system issue that prevents from > > partial P_Key > > > setup to > > > > go into production. > > > > > > Indeed :-( > > > > > > > Short Setup & test description: > > > > ------------------------------------------ > > > > * Node A: P_Key XXX (full member) > > > > * Node B, C, D, E, F: P_Key XXx (partial member) > > > > > > > > 1. Send ping from B -> A : ping is OK > > > > 2. Send ping from C -> A : ping is OK > > > > 3. Send ping from B -> C : no ping also OK > > > > * Get traps Bad P_Key in SM - from all HCA in the > > fabric > > > both for > > > > test 1 & 2 (one time) and also for test 3 (all the > > time). > > > > What does all the time mean ? Does this mean with one test 3 > > ping, the > > traps are repeated ? If so, at what rate ? > > > > every ping will generate ARP that will generate BAD PKEY trap > > OK; so what do you mean by one time v. all the time ? Is that really the > case ? > > > > > Probably the ARP request that is MC traffic > > generate the > > > trap in HCA, > > > > for test 1 > > > > & 2 we have only one ARP but for test 3 we send > > ARP all the > > > time > > > > because > > > > we do not get any ARP reply. > > > > > > > > * The trap number SM get is 257 (HCA trap) if we > > will do > > > P_Key > > > > switch enforcement we will probably get 259 > > > > > > Is this with OpenSM or VSM ? > > > > > > We tested it with Voltaire SM but it should behave the same > > with > > > OpenSM. > > > > That's likely but I'm not sure yet. > > Would you try this with OpenSM (and validate your theory about getting > switch bad PKey traps v. end port bad PKey traps) or does VSM have such > a mode (ingress/egress partition filtering) ? > > -- Hal
Yes, I will test it with OpenSM > > -- Hal > > > > > > > * We get trap also from the originator of the MC > > traffic > > > even > > > > though that receive switch relay error counter is > > increased > > > (when out > > > > port==in port), the switch does not drop the > > packet ? > > > > The implementation of that counter is broken and occurs > > "normally". The > > increment of this counter is relatively meaningless :-( > > > > > > Additional questions/issues: > > > > * Do we have a way to suppress port traps from > > SMA ?? i.e. > > > that > > > > the port will not generate traps that can "kill > > the SM" - as > > > its look > > > > this is bug in the spec where we can't send any mc > > traffic > > > (even ARP) > > > > when we have partial members and we do not have a > > way to > > > suppress the > > > > traps. > > > > All the SM can do is TrapRepress. > > > > > > * What will happen in the HCA when we get many > > traps (mc > > > packets > > > > from many nodes) and they need to keep all events > > until SM > > > will > > > > acknowledge? - Is there limitation in the number > > of on- > > > going > > > > traps (any HCA specific issues)? > > > > Assuming you mean events from which traps are generated, I > > think this is > > left as an implementation dependent detail in terms of the > > spec. An > > implementation needs to take care not to lose certain events; > > others > > like this aren't critical but that's left to the specific SMA > > implementation. > > > > -- Hal > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > Olga > > > > > > > > > > > > _______________________________________________ > > > > general mailing list > > > > [email protected] > > > > http://lists.openfabrics.org/cgi- > > > bin/mailman/listinfo/general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > _______________________________________________ > > > general mailing list > > > [email protected] > > > http://lists.openfabrics.org/cgi- > > bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > >
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
