On 16/09/13 16:25, Orlando Richards wrote:
Hi folks,

We're building a new storage service and are planning on using
multipathd rather than LSI's rdac to handle the multipathing.

It's all working well, but I'm looking at settling on the final
parameters for the multipath.conf. In particular, the values for:

  * rr_min_io (1?)
  * failback (I think "manual" or "followover"?)
  * no_path_retry (guessing here - fail?)
  * dev_loss_tmo (guessing here - 15?)
  * fast_io_fail_tmo (guessing here - 10?)

Does anyone have a working multipath.conf for LSI based storage systems
(or others, for that matter), and/or have experience and wisdom to share
on the above settings (and any others I may have missed?). Any war
stories about dm-multipath to share?



Hi all,

Thanks for the feedback on all this. From that, and more digging and testing, we've settled on the following multipath.conf stanzas:

                path_grouping_policy group_by_prio
                prio    rdac
                path_checker    rdac
                path_selector   "round-robin 0"
                hardware_handler        "1 rdac"
                features        "2 pg_init_retries 50"
                # All "standard" up to here

                # Prevent ping-ponging of controllers, but
                # allow for automatic failback
                failback        followover
                # Massively accelerate the failure detection time
                # (default settings give ~30-90 seconds, this gives ~5s)
                fast_io_fail_tmo 5
                # Keep the /dev device entries in situ for 90 seconds,
                # in case of rapid recovery of paths
                dev_loss_tmo    90
                # Don't queue traffic down a failed path
                no_path_retry   fail
                # balance much more aggressively across the active paths
                rr_min_io       1

The primary goal was to have rapid and reliable failover in a cluster environment (without ping-ponging). The defaults from multipathd gave a 30-90 second pause in I/O every time a path went away - we've managed to get it down to ~5s with the above settings.

Note that we've not tried this "in production" yet, but it has held up fine under heavy benchmark load.

Along the way we discovered an odd GPFS "feature" - if some nodes in the cluster use RDAC (and thus have /dev/sdXX devices) and some use multipathd (and thus use /dev/dm-XX devices), then the nodes can either fail to find attached NSD devices (in the case of the RDAC host where the NSD's were initially created on a multipath host) or can try to talk to them down the wrong device (for instance - talking to /dev/sdXX rather than /dev/dm-XX). We just set up this mixed environment to compare rdac vs dm-multipath, and don't expect to put it into production - but it's the kind of thing which could end up cropping up in a system migrating from RDAC to dm-multipath, or vice versa. It seems that on creation, the nsd is tagged somewhere as either "dmm" (dm-multipath) or "generic" (rdac), and servers using one type can't see the other.

We're testing a workaround for the "dm-multipath server accessing via /dev/sdXX" case just now - create the following (executable, root-owned) script in /var/mmfs/etc/nsddevices on the dm-multipath hosts:

#!/bin/ksh
#
# this script ensures that we are not using the raw /dev/sd\* devices for GPFS
# but use the multipath /dev/dm-\* devices instead
for dev in $( cat /proc/partitions | grep dm- | awk '{print $4}' )
do
    echo $dev generic
done

# skip the GPFS device discovery
exit 0


except change that simple "$dev generic" echo to one which says "$dev mpp" or "$dev generic" depending on whether the device was created with dm-multipath or rdac attached hosts. The reverse also likely would work to get the rdac host to pick up the dm-multipath created nsd's (echo $dev mpp, for the /dev/sdXX devices).

Thankfully, we have no plans to mix the environment - but for future reference it could be important (if ever migrating existing systems from rdac to dm-multipath, for instance).







--
            --
   Dr Orlando Richards
  Information Services
IT Infrastructure Division
       Unix Section
    Tel: 0131 650 4994
  skype: orlando.richards

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to