Hi All,
We have found something that seems like Infiniband Spec hole, This issue is system issue that prevents from partial P_Key setup to go into production. Short Setup & test description: ------------------------------------------ * Node A: P_Key XXX (full member) * Node B, C, D, E, F: P_Key XXx (partial member) 1. Send ping from B -> A : ping is OK 2. Send ping from C -> A : ping is OK 3. Send ping from B -> C : no ping also OK * Get traps Bad P_Key in SM - from all HCA in the fabric both for test 1 & 2 (one time) and also for test 3 (all the time). Probably the ARP request that is MC traffic generate the trap in HCA, for test 1 & 2 we have only one ARP but for test 3 we send ARP all the time because we do not get any ARP reply. * The trap number SM get is 257 (HCA trap) if we will do P_Key switch enforcement we will probably get 259 . * We get trap also from the originator of the MC traffic even though that receive switch relay error counter is increased (when out port==in port), the switch does not drop the packet ? Additional questions/issues: * Do we have a way to suppress port traps from SMA ?? i.e. that the port will not generate traps that can "kill the SM" - as its look this is bug in the spec where we can't send any mc traffic (even ARP) when we have partial members and we do not have a way to suppress the traps. * What will happen in the HCA when we get many traps (mc packets from many nodes) and they need to keep all events until SM will acknowledge? - Is there limitation in the number of on-going traps (any HCA specific issues)? Best Regards Olga
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
