Hi,
We've seen a couple of time in our automated tests than unmounting the GFS2
gets stuck and doesn't complete. It's just happened again and the stack for the
umount process looks like
[<ffffffff81087968>] flush_workqueue+0x1c8/0x520
[<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
[<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
[<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
[<ffffffff811b7ff7>] kill_block_super+0x27/0x70
[<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
[<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
[<ffffffff811b79b9>] deactivate_super+0x59/0x60
[<ffffffff811d2998>] cleanup_mnt+0x58/0x80
[<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
[<ffffffff8108c87d>] task_work_run+0x7d/0xa0
[<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
[<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
[<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
[<ffffffffffffffff>] 0xffffffffffffffff
Querying mount or /proc/mounts no longer shows the gfs2 filesystem so it's got
part way through the process.
The test has just finished deleting a series of VMs which will have had data
written to them by multiple hosts and the time gap between the last delete
completing and the umount is small so this might play a part in things.
"glocktop -d 1 -r -H" on the stuck host shows it trying to get locks from a
different host in the cluster where the referenced pids are no longer present.
Unmounting the fs on the host where the locks are supposedly held was
successful but then the stuck host started reporting that it was stuck waiting
for locks held locally by kworker threads.
Anything we should be looking at especially, possibly a fix that we don't
currently have in our kernel?
Thanks,
Mark.