Recently, we introduced patches to time out lock requests that take too long,
specifically for iopen glocks during ABBA deadlocks during evict. Before
this patch set, gfs2 never canceled the failed requests that timed out,
which can lead to deadlocks to due dlm keeping the requests on its
Conversion queue.

To deal with the timed-out requests properly, and have dlm remove them from
its Conversion queue, gfs2 must send dlm an unlock request with the dlm
"cancel" flag, and then it also must deal with the AST that dlm sends back.
This AST may be a successful response to the Cancel request, or it may be
a successful AST from the original lock request. Either way, gfs2 had
bugs dealing with the situation.

This patch set attempts to fix the problem by sending DLM a cancel and
reacting to its AST. Some ABBA deadlocks were avoided by switching the
order in which gfs2 takes its inode and iopen glocks, which was different
between some lookups and evicts.

In the process of debugging this, we discovered a problem whereby dlm
will reject lock requests with -EBUSY while the request is being canceled.
We want dlm to wait until it's not busy, but until we find a proper dlm
based solution, we need to retry the request.

Andreas Gruenbacher (2):
  gfs2: cancel timed-out glock requests
  gfs2: Switch lock order of inode and iopen glock

Bob Peterson (1):
  gfs2: Retry on dlm -EBUSY (stop gap)

 fs/gfs2/glock.c    | 11 +++++++++++
 fs/gfs2/inode.c    | 49 +++++++++++++++++++++++++---------------------
 fs/gfs2/lock_dlm.c | 22 ++++++++++++++-------
 3 files changed, 53 insertions(+), 29 deletions(-)

-- 
2.34.1

Reply via email to