----- Original Message ----- > On Fri, Nov 15, 2019 at 4:45 PM Bob Peterson <[email protected]> wrote: > > > > Hi, > > > > This patch closes a timing window in which two processes compete > > and overlap in the execution of do_xmote for the same glock: > > > > Process A Process B > > ------------------------------------ ----------------------------- > > 1. Grabs gl_lockref and calls do_xmote > > 2. Grabs gl_lockref but is blocked > > 3. Sets GLF_INVALIDATE_IN_PROGRESS > > 4. Unlocks gl_lockref > > 5. Calls do_xmote > > 6. Call glops->go_sync > > 7. test_and_clear_bit GLF_DIRTY > > 8. Call gfs2_log_flush Call glops->go_sync > > 9. (slow IO, so it blocks a long time) test_and_clear_bit GLF_DIRTY > > It's not dirty (step 7) returns > > 10. Tests GLF_INVALIDATE_IN_PROGRESS > > 11. Calls go_inval (rgrp_go_inval) > > 12. gfs2_rgrp_relse does brelse > > 13. truncate_inode_pages_range > > 14. Calls lm_lock UN > > > > In step 14 we've just told dlm to give the glock to another node > > when, in fact, process A has not finished the IO and synced all > > buffer_heads to disk and make sure their revokes are done. > > > > This patch fixes the problem by changing the GLF_INVALIDATE_IN_PROGRESS > > to use test_and_set_bit, and if the bit is already set, process B just > > ignores it and trusts that process A will do the do_xmote in the proper > > order. > > > > Signed-off-by: Bob Peterson <[email protected]> > > --- > > fs/gfs2/glock.c | 14 +++++++++++++- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c > > index faa88bd594e2..4a4a390ffd00 100644 > > --- a/fs/gfs2/glock.c > > +++ b/fs/gfs2/glock.c > > @@ -558,7 +558,19 @@ __acquires(&gl->gl_lockref.lock) > > GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target); > > if ((target == LM_ST_UNLOCKED || target == LM_ST_DEFERRED) && > > glops->go_inval) { > > - set_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags); > > + /* > > + * If another process is already doing the invalidate we > > should > > + * not interfere. If we call go_sync and it finds the glock > > is > > + * not dirty, we might call go_inval prematurely before the > > + * other go_sync has finished with its revokes. If we then > > call > > + * lm_lock prematurely, we've really screwed up: we cannot > > tell > > + * dlm to give the glock away until we're synced and > > + * invalidated. Best thing is to return and trust the other > > + * process will finish do_xmote tasks in their proper > > order. > > + */ > > That's a bit too much information here. Can we please change that as follows? > > * If another process is already doing the invalidate, let > that > * finish first. The glock state machine will get back to > this > * holder again later. > > > + if (test_and_set_bit(GLF_INVALIDATE_IN_PROGRESS, > > + &gl->gl_flags)) > > + return; > > do_error(gl, 0); /* Fail queued try locks */ > > } > > gl->gl_req = target; > > > > Thanks, > Andreas > > Sure. Make it so.
Bob
