Ok, so I did some more testing. It appears that the problem isn't
specific to the dev_loss_tmo and fast_io_fail_tmo setting. This is
evidenced by the terminal log below. In multipath.conf (which we know
for certain is being read, as the created multipath map gets the correct
alias), I instruct it to use the ALUA hardware handler for all devices.
However, for some reason, this is ignored, and the EMC hardware handler
is used instead:

=====
root@ucstest-osl2:~# cat /etc/multipath.conf 
devices {
        device {
                vendor ".*"
                product ".*"
                hardware_handler "1 alua"
        }
}

multipaths {
        multipath {
                wwid 3600601603a71320022967e0a1f38e411
                alias bootvolume
        }
}
root@ucstest-osl2:~# multipath -v 2
create: bootvolume (3600601603a71320022967e0a1f38e411) undef DGC,VRAID
size=50G features='1 queue_if_no_path' hwhandler='1 emc' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0  undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
  |- 0:0:1:0 sdb 8:16 undef ready running
  `- 1:0:0:0 sdc 8:32 undef ready running
=====

This does *NOT* happen on RHEL-based distros - on those, changing the
hardware_handler in multipath.conf in this way works as expected.

So why does it use the EMC hardware_handler? Well, there's a built-in
default device section that matches the array in question. So this
appears to override my user-specified config from multipath.conf:

=====
root@ucstest-osl2:~# multipathd -k'show config' | grep -B10 -A4 '1 emc'
        device {
                vendor "DGC"
                product ".*"
                product_blacklist "LUNZ"
                path_grouping_policy group_by_prio
                getuid_callout "/lib/udev/scsi_id --whitelisted 
--device=/dev/%n"
                path_selector round-robin 0
                path_checker emc_clariion
                checker emc_clariion
                features "1 queue_if_no_path"
                hardware_handler "1 emc"
                prio emc
                failback immediate
                no_path_retry 60
        }
=====

If I copy the entire default device config into /etc/multipath.conf and
only change the hardware_handler setting, then it starts working:

=====
root@ucstest-osl2:~# cat /etc/multipath.conf 
devices {
        device {
                vendor "DGC"
                product ".*"
                product_blacklist "LUNZ"
                path_grouping_policy group_by_prio
                getuid_callout "/lib/udev/scsi_id --whitelisted 
--device=/dev/%n"
                path_selector "round-robin 0"
                path_checker emc_clariion
                checker emc_clariion
                features "1 queue_if_no_path"
                hardware_handler "1 alua"
                prio emc
                failback immediate
                no_path_retry 60
        }
}

multipaths {
        multipath {
                wwid 3600601603a71320022967e0a1f38e411
                alias bootvolume
        }
}
root@ucstest-osl2:~# multipath -v 2
create: bootvolume (3600601603a71320022967e0a1f38e411) undef DGC,VRAID
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0  undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
  |- 0:0:1:0 sdb 8:16 undef ready running
  `- 1:0:0:0 sdc 8:32 undef ready running
=====

It would appear that for some reason, in order to override default
device settings in Ubuntu there must be an *exact* string match between
the user-supplied «vendor» and «product» settings. If I change e.g.
«product» in multipath.conf to ".*.*", then it starts using the built-in
defaults again, ignoring multipath.conf. I consider this behaviour very
dangerous - consider that if the admin has a working config (due to
exact matching vendor/product settings), and then the package gets
updated and extends the built-in defaults to incorporate some new model
matching the same profile/settings). At this point the admin's working
config will stop being used, possibly causing disruptive problems. I
therefore strongly suggest you figure out why it behaves differently in
Ubuntu and RHEL, and adopt the RHEL behaviour which really is the only
sensible one.

In any case, now that I know how to ensure my multipath.conf settings
are being used, I re-tried adding dev_loss_tmo and fast_io_fail_tmo, but
it still doesn't work:

=====
root@ucstest-osl2:~# cat /etc/multipath.conf 
devices {
        device {
                vendor "DGC"
                product ".*"
                product_blacklist "LUNZ"
                path_grouping_policy group_by_prio
                getuid_callout "/lib/udev/scsi_id --whitelisted 
--device=/dev/%n"
                path_selector "round-robin 0"
                path_checker emc_clariion
                checker emc_clariion
                features "1 queue_if_no_path"
                hardware_handler "1 alua"
                prio emc
                failback immediate
                no_path_retry 60
                fast_io_fail_tmo 3
                dev_loss_tmo 2147483647
        }
}

multipaths {
        multipath {
                wwid 3600601603a71320022967e0a1f38e411
                alias bootvolume
        }
}
root@ucstest-osl2:~# multipath -v 2
Aug 29 10:39:57 | bootvolume failed to set 
/class/fc_remote_ports/rport-0:0-1/dev_loss_tmo
create: bootvolume (3600601603a71320022967e0a1f38e411) undef DGC,VRAID
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0  undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
  |- 0:0:1:0 sdb 8:16 undef ready running
  `- 1:0:0:0 sdc 8:32 undef ready running
root@ucstest-osl2:~# grep . /sys/class/fc_remote_ports/rport-*/*tmo
/sys/class/fc_remote_ports/rport-0:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-0:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-0:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-0:0-1/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-0:0-2/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-0:0-2/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-1/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-2/fast_io_fail_tmo:off
=====

The *_tmo settings were read and understood by the config file parser,
as I can see them occur in the output from «multipathd -k'show config'».
It is also clear that they are recognised as supported options, because
if I add another «foo» option with the value of «bar» right below them,
that one does *not* show up in «multipathd -k'show config'» - so it's
clear the config parser doesn't just blindly read in any settings it
encounters.

So it clearly does not work. In any case, if you need it I'd be happy to
give you access to this test machine so you can see for yourself,
Mathieu. Find me on the NetworkManager IRC channel if you're interested
in that.

Tore

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1435706

Title:
  DevLossTO, FastIoFailTO settings do not match multipath.conf expected
  values

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1435706/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to