On Fri, Jan 7, 2022 at 5:44 PM Bob Peterson <rpete...@redhat.com> wrote: > Function gfs2_upgrade_iopen_glock tries to upgrade the iopen glock > first with a "try" lock and then with an ASYNC request with timeout. > > Before this patch, if a timeout occurred, gfs2 dequeued the glock and > went on its way. However, if the dlm lock request is never canceled, it > could get stuck on dlm's "Convert" queue forever trying to convert the > lock to EX. In some cases this led to complete cluster deadlock because > the lock could not be granted in EX until demoted, and it couldn't be > demoted because other nodes have the glock held in SH (until eviction). > Dlm never took it off its conversion queue because the lock request was > never canceled by gfs2, which let it run its course.
Hmm, before this patch, gfs2 did this: gfs2_holder_reinit(LM_ST_EXCLUSIVE, GL_ASYNC | GL_NOCACHE, gh); error = gfs2_glock_nq(gh); /* Wait for gh acquisition or timeout */ if (!test_bit(HIF_HOLDER, &gh->gh_iflags)) gfs2_glock_dq(gh); The idea of the gfs2_glock_dq() is to cancel any pending dlm requests that relate to this locking request. The "try" lock request that precedes this doesn't seem to have anything to do with the problem at hand. > This patch makes gfs2 cancel dlm requests for iopen glocks if they time > out. That allows dlm to remove it from its Conversion queue and grant > the glock to other compatible SH requesters. > > The dlm responds to the cancel request by sending gfs2 an AST callback > when the lock request is properly canceled. If another request is made > for the dlm lock, dlm returns -EBUSY until the AST is sent. Therefore, > this patch changes it from glock_dq to glock_dq_wait after the timeout, > thereby waiting for the glock state machine to progress after receiving > the AST from dlm. I'd expect gfs2_glock_dq() to do the canceling and waiting itself: those are implementation details that really shouldn't be exposed to glock users. Also, keep in mind that it's possible for the lock to be granted between the HIF_HOLDER check and the gfs2_glock_dq() in gfs2_upgrade_iopen_glock(), whereas the glock layer can take the necessary locks to prevent that race from happening. We probably also need to be more careful with the canceling when there are multiple pending holders in the queue. When a holder is dequeued, we only want to cancel the current dlm request if it is on behalf of that holder. I'm not entirely sure how the glock layer keeps track of that mapping: are pending dlm requests always on behalf of the first holder in the queue? > The purpose of the "try" lock and glock timeout was to prevent blocking > on the lock during evict, since the order of the iopen and inode glocks > cannot be guaranteed. However, waiting for dlm to return the cancel AST > should be relatively quick and guaranteed, so should never result in > deadlock. Canceling doesn't acquire any resources, so I don't see how it could possibly deadlock. Thanks, Andreas > Signed-off-by: Bob Peterson <rpete...@redhat.com> > --- > fs/gfs2/glock.c | 5 +++++ > fs/gfs2/incore.h | 1 + > fs/gfs2/super.c | 14 +++++++++++++- > 3 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c > index b7ab8430333c..0874e821ab29 100644 > --- a/fs/gfs2/glock.c > +++ b/fs/gfs2/glock.c > @@ -672,6 +672,9 @@ static void finish_xmote(struct gfs2_glock *gl, unsigned > int ret) > > /* Check for state != intended state */ > if (unlikely(state != gl->gl_target)) { > + if ((ret & LM_OUT_CANCELED) && > + test_bit(GLF_CANCELING, &gl->gl_flags)) > + goto out; > if (gh && !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags)) { > /* move to back of queue and try next entry */ > if (ret & LM_OUT_CANCELED) { > @@ -2365,6 +2368,8 @@ static const char *gflags2str(char *buf, const struct > gfs2_glock *gl) > *p++ = 'n'; > if (test_bit(GLF_INSTANTIATE_IN_PROG, gflags)) > *p++ = 'N'; > + if (test_bit(GLF_CANCELING, gflags)) > + *p++ = 'c'; > *p = 0; > return buf; > } > diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h > index 8c00fb389ae5..3572b4a2a831 100644 > --- a/fs/gfs2/incore.h > +++ b/fs/gfs2/incore.h > @@ -329,6 +329,7 @@ enum { > GLF_LRU = 13, > GLF_OBJECT = 14, /* Used only for tracing */ > GLF_BLOCKING = 15, > + GLF_CANCELING = 16, /* Now canceling lock request */ > GLF_PENDING_DELETE = 17, > GLF_FREEING = 18, /* Wait for glock to be freed */ > }; > diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c > index 143a47359d1b..e23bc7de5248 100644 > --- a/fs/gfs2/super.c > +++ b/fs/gfs2/super.c > @@ -1193,7 +1193,19 @@ static bool gfs2_upgrade_iopen_glock(struct inode > *inode) > test_bit(GLF_DEMOTE, &ip->i_gl->gl_flags), > timeout); > if (!test_bit(HIF_HOLDER, &gh->gh_iflags)) { > - gfs2_glock_dq(gh); > + if (sdp->sd_lockstruct.ls_ops->lm_cancel) { > + set_bit(GLF_CANCELING, &gh->gh_gl->gl_flags); > + sdp->sd_lockstruct.ls_ops->lm_cancel(gh->gh_gl); > + } > + /* > + * We canceled the dlm lock, so we must wait for dlm to send > + * us the AST callback, otherwise we will get -EBUSY if anyone > + * tries to lock the glock again while the dlm lock is still > + * on the dlm conversion queue. > + */ > + gfs2_glock_dq_wait(gh); > + clear_bit(GLF_CANCELING, &gh->gh_gl->gl_flags); > + smp_mb__after_atomic(); > return false; > } > return true; > -- > 2.33.1 >