On Thu, 2020-01-16 at 20:18 -0600, Benjamin Marzinski wrote:
> multipath will try to get the priority from a PATH_DOWN path, if the
> path doesn't currently have a valid priority. However, if the
> priority
> code needs to contact the device to get the priority, this is likely
> to
> fail for PATH_DOWN paths.  This code dates back to when multipathd
> could
> not easily reload device tables with failed paths, so getting the
> correct priority was important to have a correctly configured device.
> Now multipathd can simply reload the device to move the path to the
> correct pathgroup when the path comes back up.  Since there are a
> number
> of prioritizers that don't require talking to the device, multipath
> shouldn't completely skip attempting to get the priority of these
> paths,
> but it should set a small timeout, so that it isn't hanging in the
> case where it needs to contact a device through a failed path.
> 
> Signed-off-by: Benjamin Marzinski <[email protected]>
> ---
>  libmultipath/discovery.c | 14 ++++++--------
>  libmultipath/prio.c      |  2 +-
>  2 files changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 1ab093f4..2c331db8 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1661,11 +1661,10 @@ get_state (struct path * pp, struct config
> *conf, int daemon, int oldstate)
>  }
>  
>  static int
> -get_prio (struct path * pp)
> +get_prio (struct path * pp, int timeout)
>  {
>       struct prio * p;
>       struct config *conf;
> -     int checker_timeout;
>       int old_prio;
>  
>       if (!pp)
> @@ -1684,11 +1683,8 @@ get_prio (struct path * pp)
>                       return 1;
>               }
>       }
> -     conf = get_multipath_config();
> -     checker_timeout = conf->checker_timeout;
> -     put_multipath_config(conf);
>       old_prio = pp->priority;
> -     pp->priority = prio_getprio(p, pp, checker_timeout);
> +     pp->priority = prio_getprio(p, pp, timeout);
>       if (pp->priority < 0) {
>               /* this changes pp->offline, but why not */
>               int state = path_offline(pp);
> @@ -2095,11 +2091,13 @@ int pathinfo(struct path *pp, struct config
> *conf, int mask)
>  
>        /*
>         * Retrieve path priority, even for PATH_DOWN paths if it has
> never
> -       * been successfully obtained before.
> +       * been successfully obtained before. If path is down don't
> try
> +       * for too long.
>         */
>       if ((mask & DI_PRIO) && path_state == PATH_UP && strlen(pp-
> >wwid)) {
>               if (pp->state != PATH_DOWN || pp->priority ==
> PRIO_UNDEF) {
> -                     get_prio(pp);
> +                     get_prio(pp, (pp->state != PATH_DOWN)?
> +                                  (conf->checker_timeout * 1000) :
> 10);
>               }
>       }
>  
> diff --git a/libmultipath/prio.c b/libmultipath/prio.c
> index 87de1f97..21c1b092 100644
> --- a/libmultipath/prio.c
> +++ b/libmultipath/prio.c
> @@ -14,7 +14,7 @@ unsigned int get_prio_timeout(unsigned int
> checker_timeout,
>                             unsigned int default_timeout)
>  {
>       if (checker_timeout)
> -             return checker_timeout * 1000;
> +             return checker_timeout;
>       return default_timeout;
>  }
> 

This changes the unit of the first get_prio_timeout() argument from
seconds to milliseconds. While that's a good thing (it was questionable
design to have the same function take several "timeout" arguments in
different units), we should rename the argument there to avoid
confusion (checker_timeout's unit is seconds all around).

Apart from that, ACK.



--
dm-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/dm-devel

Reply via email to