Hi, Now in the -nmw tree. Thanks,
Steve. On Thu, 2013-04-25 at 12:49 -0400, Bob Peterson wrote: > Hi, > > There was a timing window when a GFS2 file system was unmounted > that caused GFS2 to call BUG() and panic the kernel. The call > to BUG() is meant to ensure that the glock reference count, > gl_ref, never gets down to zero and bounce back up again. What was > happening during umount is that function gfs2_put_super was dequeing > its glocks for well-known files. In particular, we saw it on the > journal glock, sd_jinode_gh. The dequeue caused delayed work to be > queued for the glock state machine, to transition the lock to an > "unlocked" state. While the work was still queued, gfs2_put_super > called gfs2_gl_hash_clear to clear out the glock hash tables. > If the timing was just so, the glock work function would drop the > reference count at the time when it was being checked for zero, > and that caused BUG() to be called. This patch calls > flush_workqueue before clearing the glock hash tables, thereby > ensuring that the delayed work is executed before the hash tables > are cleared, and therefore the reference count never goes to zero > until the glock is cleared. > > Regards, > > Bob Peterson > Red Hat File Systems > > Signed-off-by: Bob Peterson <[email protected]> > --- > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c > index 3b9e178..b777691 100644 > --- a/fs/gfs2/glock.c > +++ b/fs/gfs2/glock.c > @@ -1577,6 +1577,7 @@ static void dump_glock_func(struct gfs2_glock *gl) > void gfs2_gl_hash_clear(struct gfs2_sbd *sdp) > { > set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags); > + flush_workqueue(glock_workqueue); > glock_hash_walk(clear_glock, sdp); > flush_workqueue(glock_workqueue); > wait_event(sdp->sd_glock_wait, atomic_read(&sdp->sd_glock_disposal) == > 0); >
