Hello, I'm new to SRP & IB, so please bear with me...
We have a storage server running RHEL 5.1 w/ the bundled OFED 1.2 stack directly attached to an IB port on a DDN 9550. It's been running OK for about a week, but today we're getting a continuous stream of SRP abort errors: # tail /var/log/messages [...] Jan 8 17:00:59 server kernel: SRP abort called Jan 8 17:01:59 server kernel: SRP abort called Jan 8 17:02:04 server kernel: SRP reset_device called Jan 8 17:02:09 server kernel: ib_srp: SRP reset_host called Jan 8 17:02:11 server kernel: ib_srp: connection closed How can I determine the cause of the aborts? The physical connection between the server and the DDN seems to be OK (the error counts in /sys/class/infiniband/mthca0/ports/1/counters/* are all zero), and the SM (opensm) is still running. Are the aborts being triggered by the server or by the storage target (the DDN)? I'm guessing something is timing out, but what, and why? To complicate matters, the LUNs on the DDN are shared with 7 other servers as clustered LVM volumes with GFS filesystems. Each of the other servers has its own, direct IB connection to the DDN. Any suggestions on how to track down the cause of the aborts would be welcome. Thanks, John ---------------------------------------------------------------------- John Valdes Mathematics and Computer Science Division [EMAIL PROTECTED] Argonne National Laboratory _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
