Hal Rosenstock wrote:
> 
>> Fixing the cable will solve our problem, but I still think something should 
>> be done about this.
>>
>> Though OpenSM behaviour was OK, it was really difficult to find where the 
>> performances problems came from.
> 
> There should be some log messages as to the trap rate being exceeded.
> Were they not present ? Which OpenSM version ?

Only message we had are the events on trap reception (so a real lots of them). 
However we didn't check that before spending quite some time trying to 
understand where performances loss could come from.
OpenSM is git head + Bull_patches on the top.

> 
>> All our diagnostics tools (mostly using infiniband diags) were failing to 
>> see the problem.
>> Infiniband diags commands fail toward the faulty port but it was hard to say 
>> if port was faulty or if it was due to high load on the SM and dropped VL15 
>> messages.
> 
> Yes, the only thing you would observe is VL15 drops via perfquery. The
> SM is the one which should be logging the trap originator which is the
> way to diagnose this issue.
> 

It is actually. Though it's missing the port number in the log message.

Nicolas
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to