On 3/19/19 6:48 PM, Bob Peterson wrote:
----- Original Message -----
Hi,

Occasionally during testing, we see the following assertion failure in
log_pull_tail():

[ 1104.061245] gfs2: fsid=xapi-clusterd:2d2cc24c-c48a-ca.0: fatal:
assertion "atomic_read(&sdp->sd_log_blks_free) <=
sdp->sd_jdesc->jd_blocks" failed
[ 1104.061245]    function = log_pull_tail, file = fs/gfs2/log.c, line = 510

It always seems to happen shortly after journal recovery. I added some
debug logging at the point of the assertion failure and got the following:
(snip)
Any ideas about this?

Thanks,
--
Ross Lagerwall

Hi Ross,

I've been fighting with/debugging multiple recovery problems for a long time 
now.
I've done countless (well, okay, thousands of) recovery tests and I can tell
you that gfs2 recovery has some major problems. These problems usually don't
occur when you have a few gfs2 mounts: they're much more likely when you have
lots of gfs2 mounts. I'm using 32 mounts.

The problem you mentioned sounds vaguely familiar, but I can't find anything
directly related. Make sure all your journals are the same size, and see if 
fsck.gfs2
complains about the journal. Otherwise, it could be a side effect of one of the
recovery issues I'm working on. Do you have other symptoms? Also, make sure
multiple nodes aren't trying to use the same journal because of lock_nolock or
something...I've made that mistake in the past.


I think I've found the cause of the assertion I was hitting. Recovery sets sd_log_flush_head but does not take locks which means a concurrent call to gfs2_log_flush() can result in sd_log_head being set to sd_log_flush_head. A later call to gfs2_log_flush() will then hit an assertion failure in log_pull_tail() because the mismatch between sd_log_head and sd_log_tail means too many blocks are freed.

I've worked around it by taking the log_flush lock in the patch below and it seems to avoid the problem. However, tracing the recovery process I see that it sets sd_log_flush_head and then calls clean_journal() -> gfs2_write_log_header() -> gfs2_log_bmap() -> gfs2_log_incr_head(). This has:

        BUG_ON((sdp->sd_log_flush_head == sdp->sd_log_tail) &&
               (sdp->sd_log_flush_head != sdp->sd_log_head));

... but sd_log_tail and sd_log_head have not been set by gfs2_recover_func() so it might still BUG_ON() during recovery if you're particularly unlucky.

I had a look at your "GFS2: Withdraw corruption patches [V2]" series but I didn't see anything that might fix this.

If you think this patch is useful then I can send it as a proper patch to the list.

Thanks,
Ross

--------------

gfs2: Take log_flush lock during recovery

Recovery sets sd_log_flush_head but does not take any locks which means
a concurrent call to gfs2_log_flush can result in sd_log_head being set
to sd_log_flush_head. A later call to gfs2_log_flush will then hit an
assertion failure in log_pull_tail because the mismatch between
sd_log_head and sd_log_tail means too many blocks are freed.

gfs2: fsid=xapi-clusterd:88a31b8e-4072-b0.1: fatal: assertion "atomic_read(&sdp->sd_log_blks_free) <= sdp->sd_jdesc->jd_blocks" failed
function = log_pull_tail, file = fs/gfs2/log.c, line = 510

diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 0f501f938d1c..5f7c3271e909 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -498,6 +498,8 @@ void gfs2_recover_func(struct work_struct *work)
                        goto fail_gunlock_thaw;
                }

+               down_write(&sdp->sd_log_flush_lock);
+
                t_tlck = ktime_get();
                fs_info(sdp, "jid=%u: Replaying journal...\n", jd->jd_jid);

@@ -512,6 +514,8 @@ void gfs2_recover_func(struct work_struct *work)

                clean_journal(jd, &head);

+               up_write(&sdp->sd_log_flush_lock);
+
                gfs2_glock_dq_uninit(&thaw_gh);
                t_rep = ktime_get();
                fs_info(sdp, "jid=%u: Journal replayed in %lldms [jlck:%lldms, "

Reply via email to