On 5/7/19 9:32 PM, Andreas Gruenbacher wrote:
Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end.  This approach suffers from
two problems:

   (1) Any allocations necessary for the write are done in iomap_begin, so when
   the data aren't journaled, there is no need for keeping the transaction open
   until iomap_end.

   (2) Transactions keep the gfs2 log flush lock held.  When
   iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
   gfs2_write_inode, which will try to flush the log.  This requires taking the
   log flush lock which is already held, resulting in a deadlock.

Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end.  Instead, start a small transaction in page_prepare and end it in
page_done when necessary.

Unfortunately, this patch broke growing gfs2 filesystems. It is easy to reproduce:

$ mkfs.gfs2 -t xxx:yyy /dev/xvdb  4369065
$ mount /dev/xvdb /mnt
$ gfs2_grow /mnt (doesn't finish)
FS: Mount point:             /mnt
FS: Device:                  /dev/xvdb
FS: Size:                    4369062 (0x42aaa6)
DEV: Length:                 13107200 (0xc80000)
The file system will grow by 34133MB.

Looking at the kernel log, I see it hits the following assertion and then hangs trying to withdraw the filesystem (which is a separate problem, presumably):

gfs2: fsid=xxx:yyy.0: fatal: assertion "(nbuf <= tr->tr_blocks) && (tr->tr_num_revoke <= tr->tr_revokes)" failed
   function = gfs2_trans_end, file = fs/gfs2/trans.c, line = 117
gfs2: fsid=xxx:yyy.0: about to withdraw this file system

Rearranging the code so that it prints information about the transaction before the failed withdrawal attempt shows: gfs2: fsid=xxx:yyy.0: Transaction created at: iomap_write_begin.constprop.45+0xbc/0x380
gfs2: fsid=xxx:yyy.0: blocks=1 revokes=0 reserved=8 touched=1
gfs2: fsid=xxx:yyy.0: Buf 1/0 Databuf 1/0 Revoke 0/0

Reverting this commit fixes the issue. Tested with git master as of today (16d72dd4891fe).

Thanks,
--
Ross Lagerwall

Reply via email to