Hi Eitan, On 07:56 Fri 27 Jul , Eitan Zahavi wrote: > > > > On 09:26 Thu 26 Jul , Eitan Zahavi wrote: > > > > > > I am happy you actually use the simulator. > > > Please provide more info regarding the failure. You should tar > > > compress the /tmp/ibmgtsim.XXXX of your run. > > > > I can send this for you if you want, but the failure is trivial. > No need if you already know where the bug is... > > > > Yes, and it is due (6), where default Pkey is removed > > "externally". I'm not sure that OpenSM should handle the case > > when pkey table is modified externally by something which is not SM. > > > > For a few years it just worked fine. So I wonder why this fucntionality > was removed ? > It is a real BAD case where Pkeys are altered but I think would be wise > to "refresh" these tables on heavy seep.
We discussed how and when port tables refresh should be done just few days ago in this thread. My impression was that we are "in sync" about this. > In general it seems OpenSM has lost its "heavy sweep" concept. Now it > does not refresh the fabric setup even on heavy sweep. Not on each heavy sweep, but it does when it needed or when data could change. I don't think the concept was changed, just optimized. Let just look at the numbers: $ time ./opensm/opensm -e -f ./osm.log -o ... SUBNET UP Exiting SM real 0m7.995s user 0m4.488s sys 0m6.072s $ time ./opensm/opensm -e -f ./osm.log -o --qos ... SUBNET UP Exiting SM real 0m22.521s user 0m10.921s sys 0m17.173s This is simulated runs (with ibsim), the fabric is ~1300 nodes. The difference there is '--qos' flag, so OpenSM skips SL2VL and VLArb update in first run and does it in the second - sweep times are 8 against 22 seconds. > This is assuming a "perfect" HW and software and I would really this we > should have preserved that capability. What about an option? Now with subn->need_update flag (which always enforces updates) it is trivial to implement. > Note that a "heavy sweep" does not happen unless somethng changed or > trapped. Yes, for example some port was connected/disconnected, some node rebooted, etc.. OpenSM starts huge heavy sweep, it takes a while, SA is not responsive most the time, TCP connection over IPoIB timeouted, applications failed. This is production experiences... :( Sasha _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
