[Cluster-devel] Any ideas on this?

Mark Syms Tue, 02 Oct 2018 08:58:52 -0700

Hi,

We've seen a couple of time in  our automated tests than unmounting the GFS2 
gets stuck and doesn't complete. It's just happened again and the stack for the 
umount process looks like


[<ffffffff81087968>] flush_workqueue+0x1c8/0x520
[<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
[<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
[<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
[<ffffffff811b7ff7>] kill_block_super+0x27/0x70
[<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
[<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
[<ffffffff811b79b9>] deactivate_super+0x59/0x60
[<ffffffff811d2998>] cleanup_mnt+0x58/0x80
[<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
[<ffffffff8108c87d>] task_work_run+0x7d/0xa0
[<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
[<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
[<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
[<ffffffffffffffff>] 0xffffffffffffffff

Querying mount or /proc/mounts no longer shows the gfs2 filesystem so it's got 
part way through the process.

The test has just finished deleting a series of VMs which will have had data 
written to them by multiple hosts and the time gap between the last delete 
completing and the umount is small so this might play a part in things.

"glocktop -d 1 -r -H" on the stuck host shows it trying to get locks from a 
different host in the cluster where the referenced pids are no longer present.

Unmounting the fs on the host where the locks are supposedly held was 
successful but then the stuck host started reporting that it was stuck waiting 
for locks held locally by kworker threads.

Anything we should be looking at especially, possibly a fix that we don't 
currently have in our kernel?

Thanks,

        Mark.

[Cluster-devel] Any ideas on this?

Reply via email to