On Thu, 2026-02-12 at 13:05 -0500, Benjamin Marzinski wrote:
> Commit 5c977f102315 ("dm-mpath: Don't grab work_mutex while probing
> paths"), added code to make multipath quit probing paths early, if it
> was trying to suspend. This isn't necessary. It was just an
> optimization
> to try to keep path probing from delaying a suspend. However it
> causes
> problems with the intended user of this code, qemu. The path probing
> code was added because failed ioctls to multipath devices don't cause
> paths to fail in cases where a regular IO failure would.
>
> If an ioctl to a path failed because the path was down, and the
> multipath device had passed presuspend, the M_MPATH_PROBE_PATHS ioctl
> would exit early, without probing the path. The caller would then
> retry
> the original ioctl, hoping to use a different path. But if there was
> only one path in the pathgroup, it would pick the same non-working
> path
> again, even if there were working paths in other pathgroups.
>
> ioctls to a suspended dm device will return -EAGAIN, notifying the
> caller that the device is suspended, but ioctls to a device that is
> just
> preparing to suspend won't (and in general, shouldn't). This means
> that
> the caller (qemu in this case) would get into a tight loop where it
> would issue an ioctl that failed, skip probing the paths because the
> device had already passed presuspend, and start over issuing the
> ioctl
> again. This would continue until the multipath device finally fully
> suspended, or the caller gave up and failed the ioctl.
>
> multipath's path probing code could return -EAGAIN in this case, and
> the
> caller could delay a bit before retrying, but the whole purpose of
> skipping the probe after presuspend was to speed things up, and that
> would just slow them down. Instead, remove the is_suspending flag,
> and
> check dm_suspended() instead to decide whether to exit the probing
> code
> early. This means that when the probing code exits early, future
> ioctls
> will also be delayed, because the device is fully suspended.
>
> Fixes: 5c977f102315 ("dm-mpath: Don't grab work_mutex while probing
> paths")
> Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>