On Sun, 27 Apr 2008 11:47:54 +0300 Or Gerlitz <[EMAIL PROTECTED]> wrote:
> Ira Weiny wrote: > > > > I did not get any output with multicast_debug_level! > why should you, as from the node's point of view nothing has happened > (the exact param name is mcast_debug_level) > > > > Here is a patch which fixes the problem. (At least with the partial > > sub-nets > > configuration I explained before.) I will have to verify this fixes the > > problem > > I originally reported. > OK, good. Does this problem exist in the released openSM? if yes, what > would be the trigger for the SM to "really discover" (i.e do PortInfo > SET) this sub-fabric and how much time would it take to reach this > trigger, worst case wise? Yes, this is in the current released version of OpenSM, AFAICT. The trigger is: the single link separating the partial sub net will come up and that trap will cause OpenSM to resweep. I believe this will happen on the next resweep cycle which is by default 10 sec. (But this is configurable.) I don't think there is an issue with allowing OpenSM to resweep as designed. > > The failure configuration you have set to reproduce the problem is very > untypical, I think. I agree. I made a patch to turn off the processing of MAD's in the kernel to test my original theory, that the node is not responding to MAD's. Using this patch I have been able to verify that if a node stops responding that the rereg is sent by OpenSM when the node comes back. See my next email response to Sasha concerning the original issue. Ira > > Since under common clos etc topologies which don't > have a 1:n blocking nature, failure of such link would cause re-route > etc by the SM which would not (and should not) be noted by the nodes (I > hope I am not falling into another problem here...) > > Or. > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
