Hal Rosenstock wrote:
On Thu, 2008-06-12 at 14:08 +0300, Olga Shern (Voltaire) wrote:

On 6/12/08, Hal Rosenstock <[EMAIL PROTECTED]> wrote: Hi Olga, On Thu, 2008-06-12 at 09:46 +0300, Olga Shern wrote:
        > Hi All,
        >
        >
        >
        > We have found something that seems like Infiniband Spec
        hole,
What's the spec hole ? According to the Infiniband spec - partial member cannot "talk" with
partial member only with full member.
Therefore if partial member sending MC packet - all other partial
members of this partition will generate BAD PKEY trap.
 It means that the behavior that we see is according to Infiniband
Spec - but very problematic

Originally, multicast groups were all full member only and more recently
was this extended to allow partial members and this was missed. A
comment should be filed against the spec on this.

        > This issue is system issue that prevents from partial P_Key
        setup to
        > go into production.
Indeed :-( > Short Setup & test description:
        > ------------------------------------------
        > * Node A: P_Key XXX (full member)
        > * Node B, C, D, E, F: P_Key XXx (partial member)
        >
        > 1. Send ping from B -> A : ping is OK
        > 2. Send ping from C -> A : ping is OK
        > 3. Send ping from B -> C  : no ping also OK
        > * Get traps Bad P_Key in SM - from all HCA in the fabric
        both for
        > test 1 & 2 (one time) and also for test 3 (all the time).

What does all the time mean ? Does this mean with one test 3 ping, the
traps are repeated ? If so, at what rate ?

Also, why do the HCAs issue these traps? Is the pkey enforcement
on switch external ports is off? AFAIK, by default, OpenSM should
configure pkeys on switch ports that are connected to these HCAs,
so that partial member wouldn't get packet from another partial
member.

-- Yevgeny

        > Probably the ARP request that is MC traffic generate the
        trap in HCA,
        > for test 1
        > & 2 we have only one ARP but for test 3 we send ARP all the
        time
        > because
        > we do not get any ARP reply.
        >
        > * The trap number SM get is 257 (HCA trap) if we will do
        P_Key
        > switch enforcement we will probably get 259
Is this with OpenSM or VSM ? We tested it with Voltaire SM but it should behave the same with
OpenSM.

That's likely but I'm not sure yet.

        -- Hal
> * We get trap also from the originator of the MC traffic
        even
        > though that receive switch relay error counter is increased
        (when out
        > port==in port), the switch does not drop the packet ?

The implementation of that counter is broken and occurs "normally". The
increment of this counter is relatively meaningless :-(

        > Additional questions/issues:
        > * Do we have a way to suppress port traps from SMA ?? i.e.
        that
        > the port will not generate traps that can "kill the SM" - as
        its look
        > this is bug in the spec where we can't send any mc traffic
        (even ARP)
        > when we have partial members and we do not have a way to
        suppress the
        > traps.

All the SM can do is TrapRepress.

        > * What will happen in the HCA when we get many traps (mc
        packets
        > from many nodes) and they need to keep all events until SM
        will
        > acknowledge?  - Is there limitation in the number of on-
        going
        > traps (any HCA specific issues)?

Assuming you mean events from which traps are generated, I think this is
left as an implementation dependent detail in terms of the spec. An
implementation needs to take care not to lose certain events; others
like this aren't critical but that's left to the specific SMA
implementation.

-- Hal

        >
        >
        >
        >
        > Best Regards
        >
        > Olga
        >
        >
        > _______________________________________________
        > general mailing list
        > [email protected]
        > http://lists.openfabrics.org/cgi-
        bin/mailman/listinfo/general
        >
        > To unsubscribe, please visit
        http://openib.org/mailman/listinfo/openib-general
_______________________________________________
        general mailing list
        [email protected]
        http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit
        http://openib.org/mailman/listinfo/openib-general


_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to