I'd be inclined to look at something like: ibqueryerrors -s PortXmitWait,LinkDownedCounter,PortXmitDiscards,PortRcvRemotePhysicalErrors -c
And see if you have a high number of symbol errors, might be a cable needs replugging or replacing. Simon From: <[email protected]<mailto:[email protected]>> on behalf of "J. Eric Wonderley" <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, 17 January 2017 at 21:16 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [gpfsug-discuss] rmda errors scatter thru gpfs logs I have messages like these frequent my logs: Tue Jan 17 11:25:49.731 2017: [E] VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 vendor_err 136 Tue Jan 17 11:25:49.732 2017: [E] VERBS RDMA closed connection to 10.51.10.5 (cl005) on mlx5_0 port 1 fabnum 0 due to RDMA write error IBV_WC_REM_ACCESS_ERR index 23 Any ideas on cause..?
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
