EXTRA NOTES:

1. pull cable/ plug back in (or ibportstate disable/enable)
a. Within 30 seconds I/Os resume on the same path (with same cm_id, qp and cq) b. Within 30-45 seconds, I/Os resume on the same path (with new cm_id, qp and cq)
c. >45 seconds, I/Os fail-over to next path

2. After running test for a while, I stop the test, run *multipath -F* and unload ib_srp module. With RHEL 5 & 5.1, I can unload ib_srp cleanly; however, I got *srp is in use* error in SLES 10 sp1

   -vu

The following patches assist SRP/dm-multipath to failover within 60 seconds (bugzilla #577) without data corruption, read/write error

1. srp_disconnect_without_wait.patch - srp send disconnect request without waiting for CM timewait exit event since srp current does not re-use the cm_id and qp/cq of a connection (patch srp_1_recreate_at_reconnect.patch already in kernel_patches/fixes recreate the cmid, qp/cq for a connection at reconnect) 2. srp_qp_in_err_timer_reconnect_target.patch - when detecting a post_send/post_receive error, srp set qp_in_error, set a timer to reconnect to target, return SCSI_MLQUEUE_HOST_BUSY to lock the queue, and return DID_NO_CONNECT when target state is DEAD or REMOVED

Here is my multipath.conf
defaults {
       udev_dir                /dev
       polling_interval        5
       selector                "round-robin 0"
       path_grouping_policy    multibus
       getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
       prio_callout            /bin/true
       path_checker            readsector0
       rr_min_io               100
       rr_weight               priorities
       failback                immediate
       no_path_retry           5
       user_friendly_names     no
}
I also set srp_daemon.sh to rescan fabric every 60 seconds (instead of 300 secs as default setting)

I ran data integrity test to /dev/mapper/<devices> and {disable path 1, sleep 90, enable path 1, sleep 60, disable path 2, sleep 90, enable path 2, sleep 60} in the loop

RHEL5, 5.1 work very well (no data corruption, read/write failure report)
For SLES 10 sp1, it work well as long as I run *multipath* every 60 secs. I think that I mis-configured the multipathd somehow (Here is how I set it up: using the same multipath.conf above, chkconfig boot.multipath on and chkconf multipathd on)

  -vu





_______________________________________________
ewg mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

_______________________________________________
ewg mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Reply via email to