Before this patch, function do_xmote just assumed all the writes submitted to the journal were finished and successful, and it called the go_unlock function to release the dlm lock. But if they're not, and a revoke failed to make its way to the journal, a journal replay on another node will cause corruption if we let the go_inval function continue and tell dlm to release the glock to another node. This patch adds a couple assert_withdraws in do_xmote after the calls to go_sync and go_inval. The asserts should cause another node to replay the journal before continuing, thus protecting rgrp and dinode glocks and maintaining the integrity of the metadata.
Signed-off-by: Bob Peterson <rpete...@redhat.com> --- fs/gfs2/glock.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index ba61bba46785..afb336b65abd 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -566,8 +566,12 @@ __acquires(&gl->gl_lockref.lock) spin_unlock(&gl->gl_lockref.lock); if (glops->go_sync) glops->go_sync(gl); + gfs2_assert_withdraw(sdp, atomic_read(&sdp->sd_log_errors) == 0); if (test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags)) glops->go_inval(gl, target == LM_ST_DEFERRED ? 0 : DIO_METADATA); + + if (!gfs2_assert_withdraw(sdp, atomic_read(&sdp->sd_log_errors) == 0)) + gfs2_assert_withdraw(sdp, !atomic_read(&gl->gl_ail_count)); clear_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags); gfs2_glock_hold(gl); -- 2.20.1