On 6/12/08, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > On Thu, 2008-06-12 at 14:08 +0300, Olga Shern (Voltaire) wrote: > > > > > > On 6/12/08, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > Hi Olga, > > > > On Thu, 2008-06-12 at 09:46 +0300, Olga Shern wrote: > > > Hi All, > > > > > > > > > > > > We have found something that seems like Infiniband Spec > > hole, > > > > What's the spec hole ? > > > > According to the Infiniband spec - partial member cannot "talk" with > > partial member only with full member. > > Therefore if partial member sending MC packet - all other partial > > members of this partition will generate BAD PKEY trap. > > It means that the behavior that we see is according to Infiniband > > Spec - but very problematic > > Originally, multicast groups were all full member only and more recently > was this extended to allow partial members and this was missed. A > comment should be filed against the spec on this. > > > > This issue is system issue that prevents from partial P_Key > > setup to > > > go into production. > > > > Indeed :-( > > > > > Short Setup & test description: > > > ------------------------------------------ > > > * Node A: P_Key XXX (full member) > > > * Node B, C, D, E, F: P_Key XXx (partial member) > > > > > > 1. Send ping from B -> A : ping is OK > > > 2. Send ping from C -> A : ping is OK > > > 3. Send ping from B -> C : no ping also OK > > > * Get traps Bad P_Key in SM - from all HCA in the fabric > > both for > > > test 1 & 2 (one time) and also for test 3 (all the time). > > What does all the time mean ? Does this mean with one test 3 ping, the > traps are repeated ? If so, at what rate ?
every ping will generate ARP that will generate BAD PKEY trap > > Probably the ARP request that is MC traffic generate the > > trap in HCA, > > > for test 1 > > > & 2 we have only one ARP but for test 3 we send ARP all the > > time > > > because > > > we do not get any ARP reply. > > > > > > * The trap number SM get is 257 (HCA trap) if we will do > > P_Key > > > switch enforcement we will probably get 259 > > > > Is this with OpenSM or VSM ? > > > > We tested it with Voltaire SM but it should behave the same with > > OpenSM. > > That's likely but I'm not sure yet. > > > -- Hal > > > > > * We get trap also from the originator of the MC traffic > > even > > > though that receive switch relay error counter is increased > > (when out > > > port==in port), the switch does not drop the packet ? > > The implementation of that counter is broken and occurs "normally". The > increment of this counter is relatively meaningless :-( > > > > Additional questions/issues: > > > * Do we have a way to suppress port traps from SMA ?? i.e. > > that > > > the port will not generate traps that can "kill the SM" - as > > its look > > > this is bug in the spec where we can't send any mc traffic > > (even ARP) > > > when we have partial members and we do not have a way to > > suppress the > > > traps. > > All the SM can do is TrapRepress. > > > > * What will happen in the HCA when we get many traps (mc > > packets > > > from many nodes) and they need to keep all events until SM > > will > > > acknowledge? - Is there limitation in the number of on- > > going > > > traps (any HCA specific issues)? > > Assuming you mean events from which traps are generated, I think this is > left as an implementation dependent detail in terms of the spec. An > implementation needs to take care not to lose certain events; others > like this aren't critical but that's left to the specific SMA > implementation. > > -- Hal > > > > > > > > > > > > > > > > Best Regards > > > > > > Olga > > > > > > > > > _______________________________________________ > > > general mailing list > > > [email protected] > > > http://lists.openfabrics.org/cgi- > > bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > _______________________________________________ > > general mailing list > > [email protected] > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > >
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
