Hal Rosenstock wrote: > >> Fixing the cable will solve our problem, but I still think something should >> be done about this. >> >> Though OpenSM behaviour was OK, it was really difficult to find where the >> performances problems came from. > > There should be some log messages as to the trap rate being exceeded. > Were they not present ? Which OpenSM version ?
Only message we had are the events on trap reception (so a real lots of them). However we didn't check that before spending quite some time trying to understand where performances loss could come from. OpenSM is git head + Bull_patches on the top. > >> All our diagnostics tools (mostly using infiniband diags) were failing to >> see the problem. >> Infiniband diags commands fail toward the faulty port but it was hard to say >> if port was faulty or if it was due to high load on the SM and dropped VL15 >> messages. > > Yes, the only thing you would observe is VL15 drops via perfquery. The > SM is the one which should be logging the trap originator which is the > way to diagnose this issue. > It is actually. Though it's missing the port number in the log message. Nicolas _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
