Recently, we introduced patches to time out lock requests that take too long, specifically for iopen glocks during ABBA deadlocks during evict. Before this patch set, gfs2 never canceled the failed requests that timed out, which can lead to deadlocks to due dlm keeping the requests on its Conversion queue.
To deal with the timed-out requests properly, and have dlm remove them from its Conversion queue, gfs2 must send dlm an unlock request with the dlm "cancel" flag, and then it also must deal with the AST that dlm sends back. This AST may be a successful response to the Cancel request, or it may be a successful AST from the original lock request. Either way, gfs2 had bugs dealing with the situation. This patch set attempts to fix the problem by sending DLM a cancel and reacting to its AST. Some ABBA deadlocks were avoided by switching the order in which gfs2 takes its inode and iopen glocks, which was different between some lookups and evicts. In the process of debugging this, we discovered a problem whereby dlm will reject lock requests with -EBUSY while the request is being canceled. We want dlm to wait until it's not busy, but until we find a proper dlm based solution, we need to retry the request. Andreas Gruenbacher (2): gfs2: cancel timed-out glock requests gfs2: Switch lock order of inode and iopen glock Bob Peterson (1): gfs2: Retry on dlm -EBUSY (stop gap) fs/gfs2/glock.c | 11 +++++++++++ fs/gfs2/inode.c | 49 +++++++++++++++++++++++++--------------------- fs/gfs2/lock_dlm.c | 22 ++++++++++++++------- 3 files changed, 53 insertions(+), 29 deletions(-) -- 2.34.1