Hi,

On 13/02/2019 15:21, Bob Peterson wrote:
Before this patch, function rgrp_go_inval would not invalidate the
ail list, which meant that there might still be buffers outstanding
on the ail that had revokes still pending. If the revokes had still
not been written when the glock was given to another node, and that
node (with outstanding revokes) died for some reason, the resulting
journal replay would replay the un-revoked rgrps, thus wiping out
changes made by the node who rightfully received the rgrp in EX.
This caused metadata corruption.

Signed-off-by: Bob Peterson <rpete...@redhat.com>

rgrp_go_sync() has a call to gfs2_ail_empty_gl() so there should be no revokes to worry about when rgrp_go_inval is called, because everything should have already been written to the log & flushed. Has something gone wrong in the logic which somehow allows that step to be skipped in some cases?

Steve.

---
  fs/gfs2/glops.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 9c86c8004ba7..4b0e52bf5825 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -166,6 +166,8 @@ static void rgrp_go_sync(struct gfs2_glock *gl)
        error = filemap_fdatawait_range(mapping, gl->gl_vm.start, 
gl->gl_vm.end);
        mapping_set_error(mapping, error);
        gfs2_ail_empty_gl(gl);
+       gfs2_assert_withdraw(gl->gl_name.ln_sbd,
+                            gl->gl_name.ln_sbd->sd_log_error == 0);
spin_lock(&gl->gl_lockref.lock);
        rgd = gl->gl_object;
@@ -196,6 +198,7 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
        WARN_ON_ONCE(!(flags & DIO_METADATA));
        gfs2_assert_withdraw(sdp, !atomic_read(&gl->gl_ail_count));
        truncate_inode_pages_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
+       gfs2_ail_empty_gl(gl);
if (rgd)
                rgd->rd_flags &= ~GFS2_RDF_UPTODATE;

Reply via email to