On 08/10/18 13:59, Mark Syms wrote:
That sounds entirely reasonable so long as you are absolutely sure that nothing 
is ever going to mess with that glock, we erred on the side of more caution not 
knowing whether it would be guaranteed safe or not.

Thanks,

        Mark
We should have a look at the history to see how that wait got added. However the "dead" flag here means "don't touch this glock" and is there so that we can separate the marking dead from the actual removal from the list (which simplifies the locking during the scanning procedures)

Steve.

-----Original Message-----
From: Steven Whitehouse <swhit...@redhat.com>
Sent: 08 October 2018 13:56
To: Mark Syms <mark.s...@citrix.com>; cluster-devel@redhat.com
Cc: Ross Lagerwall <ross.lagerw...@citrix.com>; Tim Smith <tim.sm...@citrix.com>
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find 
insert glock

Hi,


On 08/10/18 13:36, Mark Syms wrote:
During a VM stress test we encountered a system lockup and kern.log
contained

kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 
seconds.
kernel: [21389.462749] Tainted: G O 4.4.0+10 #1
kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000
kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00
ffff88005a509c00 ffff88019628c000
kernel: [21389.462795] ffffc90040226000 ffff88019628bd80
fffffffffffffe58 ffff8801818da418
kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4
ffff8801818da5c0 ffff88019628bd68
kernel: [21389.462803] Call Trace:
kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80
kernel: [21389.462877] [<ffffffffa0663624>]
find_insert_glock+0x4a4/0x530 [gfs2]
kernel: [21389.462891] [<ffffffffa0660c20>] ?
gfs2_holder_wake+0x20/0x20 [gfs2]
kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330
[gfs2]
kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2]
kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0
[gfs2]
kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120
kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0
[gfs2]
kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170
kernel: [21389.462948] [<ffffffff815a57ee>]
entry_SYSCALL_64_fastpath+0x12/0x71

On examination of the code it was determined that this code path is
only taken if the selected glock is marked as dead, the supposition
therefore is that by the time schedule as called the glock had been
cleaned up and therefore nothing woke the schedule. Instead of calling
schedule, call schedule_timeout(HZ) so at least we get a chance to
re-evaluate.

On repeating the stress test, the printk message was seen once in the
logs across four servers but no further occurences nor were there any
stuck task log entries. This indicates that when the timeout occured
the code repeated the lookup and did not find the same glock entry but
as we hadn't been woken this means that we would never have been woken.

Signed-off-by: Mark Syms <mark.s...@citrix.com>
Signed-off-by: Tim Smith <tim.sm...@citrix.com>
---
   fs/gfs2/glock.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 4614ee2..0a59a01
100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct 
lm_lockname *name,
        }
        if (gl && !lockref_get_not_dead(&gl->gl_lockref)) {
                rcu_read_unlock();
-               schedule();
+               if (schedule_timeout(HZ) == 0)
+                       printk(KERN_INFO "find_insert_glock schedule timed 
out\n");
                goto again;
        }
   out:
That is a bit odd. In fact that whole function looks odd. I wonder why it needs 
to wait in the first place. It should be a simple comparison here. If the glock 
is dead then nothing else should touch it, so we are safe to add a new one into 
the hash table. The wait is almost certainly a bug,

Steve.


Reply via email to