Hi Bob, On Tue, Jul 27, 2021 at 7:37 PM Bob Peterson <rpete...@redhat.com> wrote: > Before this patch, function gfs2_ail1_empty could issue a file system > withdraw when IO errors were discovered. However, there are several > callers, including gfs2_flush_revokes() which holds the gfs2_log_lock > before calling gfs2_ail1_empty. If gfs2_ail1_empty needed to withdraw > it would leave the gfs2_log_lock held, which resulted in a deadlock > due to other processes that needed the log_lock. > > Another problem discovered by Christoph Helwig is that we cannot > withdraw from the log_flush process because it may be called from > the glock workqueue, and the withdraw process waits for that very > workqueue to be flushed. So the withdraw must be ignored until it may > be handled by a more appropriate context like the gfs2_logd daemon. > > This patch moves the withdraw out of function gfs2_ail1_empty and > makes each of the callers check for a withdraw by calling new function > check_ail1_withdraw.
> Function gfs2_flush_revokes now does this check > after releasing the gfs2_log_lock to avoid the deadlock. I don't see that in the code. > > Signed-off-by: Bob Peterson <rpete...@redhat.com> > --- > fs/gfs2/log.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c > index f0ee3ff6f9a8..c687a8c4e044 100644 > --- a/fs/gfs2/log.c > +++ b/fs/gfs2/log.c > @@ -364,11 +364,6 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp, int > max_revokes) > ret = list_empty(&sdp->sd_ail1_list); > spin_unlock(&sdp->sd_ail_lock); > > - if (test_bit(SDF_WITHDRAWING, &sdp->sd_flags)) { > - gfs2_lm(sdp, "fatal: I/O error(s)\n"); > - gfs2_withdraw(sdp); > - } > - > return ret; > } > > @@ -786,6 +781,15 @@ void gfs2_glock_remove_revoke(struct gfs2_glock *gl) > } > } > > +static void check_ail1_withdraw(struct gfs2_sbd *sdp) > +{ > + if (!test_bit(SDF_WITHDRAWING, &sdp->sd_flags)) > + return; > + > + gfs2_lm(sdp, "fatal: I/O error(s)\n"); > + gfs2_withdraw(sdp); > +} > + > /** > * gfs2_flush_revokes - Add as many revokes to the system transaction as we > can > * @sdp: The GFS2 superblock > @@ -1317,6 +1321,7 @@ int gfs2_logd(void *data) > > if (gfs2_jrnl_flush_reqd(sdp) || t == 0) { > gfs2_ail1_empty(sdp, 0); > + check_ail1_withdraw(sdp); > gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL | > GFS2_LFC_LOGD_JFLUSH_REQD); > } > @@ -1325,6 +1330,7 @@ int gfs2_logd(void *data) > gfs2_ail1_start(sdp); > gfs2_ail1_wait(sdp); > gfs2_ail1_empty(sdp, 0); > + check_ail1_withdraw(sdp); > gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL | > > GFS2_LFC_LOGD_AIL_FLUSH_REQD); > } > -- > 2.31.1 > Thanks, Andreas