On 7/28/21 12:38 AM, Andreas Gruenbacher wrote:
Hi Bob,
On Tue, Jul 27, 2021 at 7:37 PM Bob Peterson <rpete...@redhat.com> wrote:
Before this patch, function gfs2_ail1_empty could issue a file system
withdraw when IO errors were discovered. However, there are several
callers, including gfs2_flush_revokes() which holds the gfs2_log_lock
before calling gfs2_ail1_empty. If gfs2_ail1_empty needed to withdraw
it would leave the gfs2_log_lock held, which resulted in a deadlock
due to other processes that needed the log_lock.
Another problem discovered by Christoph Helwig is that we cannot
withdraw from the log_flush process because it may be called from
the glock workqueue, and the withdraw process waits for that very
workqueue to be flushed. So the withdraw must be ignored until it may
be handled by a more appropriate context like the gfs2_logd daemon.
This patch moves the withdraw out of function gfs2_ail1_empty and
makes each of the callers check for a withdraw by calling new function
check_ail1_withdraw.
Function gfs2_flush_revokes now does this check
after releasing the gfs2_log_lock to avoid the deadlock.
I don't see that in the code.
Yeah, the comment was wrong. I noticed the problem and already removed
the paragraph after the patch set was sent out.
Bob