Ping?

On 4/29/20 1:15 PM, Ross Lagerwall wrote:
> We saw an issue in a production server on a customer deployment where
> DLM 4.0.7 gets "stuck" and unable to join new lockspaces.
> 
> See - https://lists.clusterlabs.org/pipermail/users/2019-January/016054.html
> 
> This was forwarded off list to David Teigland who responded thusly.
> 
> "
> Hi, thanks for the debugging info.  You've spent more time looking at
> this than I have, but from a first glance it seems to me that the
> initial problem (there may be multiple) is that in the kernel,
> lockspace.c do_event() does not sensibly handle the ERESTARTSYS error
> from wait_event_interruptible().  I think do_event() should continue
> waiting for a uevent result from userspace until it gets one, because
> the kernel can't do anything sensible until it gets that.
> 
> Dave
> "
> 
> The previous attempt at fixing this was NAKed by Linus since it could
> cause a busy-wait loop. Instead, just switch wait_event_interruptible()
> to wait_event().
> 
> Signed-off-by: Ross Lagerwall <[email protected]>
> ---
>  fs/dlm/lockspace.c | 18 ++++--------------
>  1 file changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
> index afb8340918b8..e93670ecfae5 100644
> --- a/fs/dlm/lockspace.c
> +++ b/fs/dlm/lockspace.c
> @@ -197,8 +197,6 @@ static struct kset *dlm_kset;
>  
>  static int do_uevent(struct dlm_ls *ls, int in)
>  {
> -     int error;
> -
>       if (in)
>               kobject_uevent(&ls->ls_kobj, KOBJ_ONLINE);
>       else
> @@ -209,20 +207,12 @@ static int do_uevent(struct dlm_ls *ls, int in)
>       /* dlm_controld will see the uevent, do the necessary group management
>          and then write to sysfs to wake us */
>  
> -     error = wait_event_interruptible(ls->ls_uevent_wait,
> -                     test_and_clear_bit(LSFL_UEVENT_WAIT, &ls->ls_flags));
> +     wait_event(ls->ls_uevent_wait,
> +                test_and_clear_bit(LSFL_UEVENT_WAIT, &ls->ls_flags));
>  
> -     log_rinfo(ls, "group event done %d %d", error, ls->ls_uevent_result);
> -
> -     if (error)
> -             goto out;
> +     log_rinfo(ls, "group event done %d", ls->ls_uevent_result);
>  
> -     error = ls->ls_uevent_result;
> - out:
> -     if (error)
> -             log_error(ls, "group %s failed %d %d", in ? "join" : "leave",
> -                       error, ls->ls_uevent_result);
> -     return error;
> +     return ls->ls_uevent_result;
>  }
>  
>  static int dlm_uevent(struct kset *kset, struct kobject *kobj,
>

Reply via email to