Bob,

On Tue, Aug 2, 2022 at 7:58 PM Bob Peterson <rpete...@redhat.com> wrote:
> There are a couple places in function do_xmote where normal processing
> is circumvented due to withdraws in progress. However, since we bypass
> most of do_xmote() we bypass telling dlm to lock the dlm lock, which
> means dlm will never respond with a completion callback. Since the
> completion callback ordinarily clears GLF_LOCK, this patch changes
> function do_xmote to handle those situations more gracefully so the
> file system may be unmounted after withdraw.
>
> A very similar situation happens with the GLF_DEMOTE_IN_PROGRESS flag,
> which is cleared by function finish_xmote(). Since the withdraw causes
> us to skip the majority of do_xmote, it therefore also skips the call
> to finish_xmote() so the DEMOTE_IN_PROGRESS flag needs to be cleared
> manually as well.
>
> Signed-off-by: Bob Peterson <rpete...@redhat.com>
> ---
>  fs/gfs2/glock.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 0bfecffd71f1..d508d8fa0838 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -59,6 +59,8 @@ typedef void (*glock_examiner) (struct gfs2_glock * gl);
>
>  static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned 
> int target);
>  static void __gfs2_glock_dq(struct gfs2_holder *gh);
> +static void handle_callback(struct gfs2_glock *gl, unsigned int state,
> +                           unsigned long delay, bool remote);
>
>  static struct dentry *gfs2_root;
>  static struct workqueue_struct *glock_workqueue;
> @@ -762,8 +764,21 @@ __acquires(&gl->gl_lockref.lock)
>         int ret;
>
>         if (target != LM_ST_UNLOCKED && glock_blocked_by_withdraw(gl) &&
> -           gh && !(gh->gh_flags & LM_FLAG_NOEXP))
> +           gh && !(gh->gh_flags & LM_FLAG_NOEXP)) {
> +               /*
> +                * We won't tell dlm to perform the lock, so we won't get a
> +                * reply that would otherwise clear GLF_LOCK. So we clear it.
> +                */
> +               handle_callback(gl, LM_ST_UNLOCKED, 0, false);
> +               clear_bit(GLF_LOCK, &gl->gl_flags);
> +               clear_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
> +               /*
> +                * Don't increment lockref here. The next time the worker 
> runs it will do
> +                * glock_put, which will decrement it to 0, and free the 
> glock.
> +                */

I don't understand the reference counting logic here: where's the
alleged reference coming from that we're passing on to the work
function here?

Note that further below in do_xmote(), we're calling gfs2_glock_hold()
followed by gfs2_glock_queue_work(), so the reference counting logic
seems normal there -- except that when ->lm_lock returns an error,
we're apparently leaking a reference. So maybe the gfs2_glock_hold()
should be moved right in front of the gfs2_glock_queue_work() calls to
make the code less fragile?

> +               __gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
>                 return;
> +       }
>         lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
>                       LM_FLAG_PRIORITY);
>         GLOCK_BUG_ON(gl, gl->gl_state == target);
> @@ -848,6 +863,8 @@ __acquires(&gl->gl_lockref.lock)
>             (target != LM_ST_UNLOCKED ||
>              test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags))) {
>                 if (!is_system_glock(gl)) {
> +                       clear_bit(GLF_LOCK, &gl->gl_flags);
> +                       clear_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
>                         gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
>                         goto out;
>                 } else {
> --
> 2.36.1
>

Thanks,
Andreas

Reply via email to