Hi,

On 25/11/15 14:22, Bob Peterson wrote:
----- Original Message -----
Hi,

On 19/11/15 18:42, Bob Peterson wrote:
This patch changes function gfs2_clear_inode() so that instead
of calling gfs2_glock_put directly() most of the time, it queues
the glock to the delayed work queue. That avoids a possible
deadlock where it calls dlm during a fence operation:
dlm waits for a fence operation, the fence operation waits for
memory, the shrinker waits for gfs2 to free an inode from memory,
but gfs2 waits for dlm.

Signed-off-by: Bob Peterson <rpete...@redhat.com>
---
   fs/gfs2/glock.c | 34 +++++++++++++++++-----------------
   fs/gfs2/glock.h |  1 +
   fs/gfs2/super.c |  5 ++++-
   3 files changed, 22 insertions(+), 18 deletions(-)
[snip]
Most of the patch seems to just rename the workqueue which makes it
tricky to spot the other changes. However, the below code seems to be
the new bit..

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 9d5c3f7..46e5004 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -24,6 +24,7 @@
   #include <linux/crc32.h>
   #include <linux/time.h>
   #include <linux/wait.h>
+#include <linux/workqueue.h>
   #include <linux/writeback.h>
   #include <linux/backing-dev.h>
   #include <linux/kernel.h>
@@ -1614,7 +1615,9 @@ out:
        ip->i_gl->gl_object = NULL;
        flush_delayed_work(&ip->i_gl->gl_work);
        gfs2_glock_add_to_lru(ip->i_gl);
-       gfs2_glock_put(ip->i_gl);
+       if (queue_delayed_work(gfs2_glock_workqueue,
+                              &ip->i_gl->gl_work, 0) == 0)
+               gfs2_glock_put(ip->i_gl);
        ip->i_gl = NULL;
        if (ip->i_iopen_gh.gh_gl) {
                ip->i_iopen_gh.gh_gl->gl_object = NULL;
which replaces a put with a queue & put if the queue fails (due to it
being already on the queue) which doesn't look quite right to be since
if calling gfs2_glock_put() was not safe before, then calling it
conditionally like this is still no safer I think?

Steve.
Hi,

The call to gfs2_glock_put() in this case should be safe.

If queuing the delayed work fails, it means the glock reference count is
greater than 1, to be decremented when the glock state machine runs.
Which means this can't be the final glock_put().
Which means we can't possibly call into DLM, which means we can't block.
Which means it's safe.

Regards,

Bob Peterson
Red Hat File Systems

There is no reason that this cannot be the final glock put, since there is no synchronization with the work that has been queued, so it might well have run and decremented the ref count before we return from the queuing function. It is unlikely that will be the case, but it is still possible,

Steve.

Reply via email to