Ping?
On 4/29/20 1:15 PM, Ross Lagerwall wrote:
> We saw an issue in a production server on a customer deployment where
> DLM 4.0.7 gets "stuck" and unable to join new lockspaces.
>
> See - https://lists.clusterlabs.org/pipermail/users/2019-January/016054.html
>
> This was forwarded off list to David Teigland who responded thusly.
>
> "
> Hi, thanks for the debugging info. You've spent more time looking at
> this than I have, but from a first glance it seems to me that the
> initial problem (there may be multiple) is that in the kernel,
> lockspace.c do_event() does not sensibly handle the ERESTARTSYS error
> from wait_event_interruptible(). I think do_event() should continue
> waiting for a uevent result from userspace until it gets one, because
> the kernel can't do anything sensible until it gets that.
>
> Dave
> "
>
> The previous attempt at fixing this was NAKed by Linus since it could
> cause a busy-wait loop. Instead, just switch wait_event_interruptible()
> to wait_event().
>
> Signed-off-by: Ross Lagerwall <[email protected]>
> ---
> fs/dlm/lockspace.c | 18 ++++--------------
> 1 file changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
> index afb8340918b8..e93670ecfae5 100644
> --- a/fs/dlm/lockspace.c
> +++ b/fs/dlm/lockspace.c
> @@ -197,8 +197,6 @@ static struct kset *dlm_kset;
>
> static int do_uevent(struct dlm_ls *ls, int in)
> {
> - int error;
> -
> if (in)
> kobject_uevent(&ls->ls_kobj, KOBJ_ONLINE);
> else
> @@ -209,20 +207,12 @@ static int do_uevent(struct dlm_ls *ls, int in)
> /* dlm_controld will see the uevent, do the necessary group management
> and then write to sysfs to wake us */
>
> - error = wait_event_interruptible(ls->ls_uevent_wait,
> - test_and_clear_bit(LSFL_UEVENT_WAIT, &ls->ls_flags));
> + wait_event(ls->ls_uevent_wait,
> + test_and_clear_bit(LSFL_UEVENT_WAIT, &ls->ls_flags));
>
> - log_rinfo(ls, "group event done %d %d", error, ls->ls_uevent_result);
> -
> - if (error)
> - goto out;
> + log_rinfo(ls, "group event done %d", ls->ls_uevent_result);
>
> - error = ls->ls_uevent_result;
> - out:
> - if (error)
> - log_error(ls, "group %s failed %d %d", in ? "join" : "leave",
> - error, ls->ls_uevent_result);
> - return error;
> + return ls->ls_uevent_result;
> }
>
> static int dlm_uevent(struct kset *kset, struct kobject *kobj,
>