Hi Edwin and all,

On Wed, 6 Mar 2019 at 12:08, Edwin Török <edvin.to...@citrix.com> wrote:
>
> Hello,
>
> I've been trying to debug a GFS2 deadlock that we see in our lab quite 
> frequently with a 4.19 kernel. With 4.4 and older kernels we were not able to 
> reproduce this.
> See below for lockdep dumps and stacktraces.
> Ignoring the lockdep warnings mentioned in my previous email I think I 
> narrowed down the problem to a deadlock between iomap and writeback:
>
> IIUC the sequence of calls leading up to this is:
>   aio_write ->
>       gfs2_file_write_iter ->
>             iomap_file_buffered_write ->
>               iomap_apply ->
>                  iomap_begin -> gfs2_iomap_begin_write -> gfs2_trans_begin -> 
> down_read(&sdp->sd_log_flush_lock)
>                  iomap_write_actor ->
>                     balance_dirty_pages ->
>   ... waits for writeback ...

it took us several iterations, but that deadlock and all the follow-up
issues that were popping up should be fixed now. The deadlock fix
(gfs2: Fix iomap write page reclaim deadlock) required a follow-up fix
(gfs2: Inode dirtying fix) and caused a performance regression (iomap:
don't mark the inode dirty in iomap_write_end); in addition, function
gfs2_walk_metadata from v4.18 turned out to have issues (gfs2:
gfs2_walk_metadata fix). This should all be working fine in v5.3-rc4
now.

We're working on getting Bob's recovery patch queue ready for the next
merge window, which should fix the issues you've been seeing with
iSCSI failures. The changes for that will be close to what's on the
for-next.recovery10 branch in the gfs2 repository right now. If you
can still reproduce problems with that code base, please do let us
know.

Thanks,
Andreas

Reply via email to