On Tue, Mar 07, 2017 at 12:49:58PM -0800, Liu Bo wrote: > On Mon, Mar 06, 2017 at 10:55:46AM +0800, Qu Wenruo wrote: > > [BUG] > > When btrfs_reloc_clone_csum() reports error, it can underflow metadata > > and leads to kernel assertion on outstanding extents in > > run_delalloc_nocow() and cow_file_range(). > > > > BTRFS info (device vdb5): relocating block group 12582912 flags data > > BTRFS info (device vdb5): found 1 extents > > assertion failed: inode->outstanding_extents >= num_extents, file: > > fs/btrfs//extent-tree.c, line: 5858 > > > > Currently, due to another bug blocking ordered extents, the bug is only > > reproducible under certain block group layout and using error injection. > > > > a) Create one data block group with one 4K extent in it. > > To avoid the bug that hangs btrfs due to ordered extent which never > > finishes > > b) Make btrfs_reloc_clone_csum() always fail > > c) Relocate that block group > > > > [CAUSE] > > run_delalloc_nocow() and cow_file_range() handles error from > > btrfs_reloc_clone_csum() wrongly: > > > > (The ascii chart shows a more generic case of this bug other than the > > bug mentioned above) > > > > |<------------------ delalloc range --------------------------->| > > | OE 1 | OE 2 | ... | OE n | > > |<----------- cleanup range --------------->| > > |<----------- ----------->| > > \/ > > btrfs_finish_ordered_io() range > > > > So error handler, which calls extent_clear_unlock_delalloc() with > > EXTENT_DELALLOC and EXTENT_DO_ACCOUNT bits, and btrfs_finish_ordered_io() > > will both cover OE n, and free its metadata, causing metadata under flow. > > > > [Fix] > > The fix is to ensure after calling btrfs_add_ordered_extent(), we only > > call error handler after increasing the iteration offset, so that > > cleanup range won't cover any created ordered extent. > > > > |<------------------ delalloc range --------------------------->| > > | OE 1 | OE 2 | ... | OE n | > > |<----------- ----------->|<---------- cleanup range --------->| > > \/ > > btrfs_finish_ordered_io() range > > > > Signed-off-by: Qu Wenruo <[email protected]> > > --- > > changelog: > > v6: > > New, split from v5 patch, as this is a separate bug. > > --- > > fs/btrfs/inode.c | 51 +++++++++++++++++++++++++++++++++++++++------------ > > 1 file changed, 39 insertions(+), 12 deletions(-) > > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index b2bc07aad1ae..1d83d504f2e5 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -998,15 +998,24 @@ static noinline int cow_file_range(struct inode > > *inode, > > BTRFS_DATA_RELOC_TREE_OBJECTID) { > > ret = btrfs_reloc_clone_csums(inode, start, > > cur_alloc_size); > > + /* > > + * Only drop cache here, and process as normal. > > + * > > + * We must not allow extent_clear_unlock_delalloc() > > + * at out_unlock label to free meta of this ordered > > + * extent, as its meta should be freed by > > + * btrfs_finish_ordered_io(). > > + * > > + * So we must continue until @start is increased to > > + * skip current ordered extent. > > + */ > > if (ret) > > - goto out_drop_extent_cache; > > + btrfs_drop_extent_cache(BTRFS_I(inode), start, > > + start + ram_size - 1, 0); > > } > > > > btrfs_dec_block_group_reservations(fs_info, ins.objectid); > > > > - if (disk_num_bytes < cur_alloc_size) > > - break; > > - > > /* we're not doing compressed IO, don't unlock the first > > * page (which the caller expects to stay locked), don't > > * clear any dirty bits and don't set any writeback bits > > @@ -1022,10 +1031,21 @@ static noinline int cow_file_range(struct inode > > *inode, > > delalloc_end, locked_page, > > EXTENT_LOCKED | EXTENT_DELALLOC, > > op); > > - disk_num_bytes -= cur_alloc_size; > > + if (disk_num_bytes > cur_alloc_size) > > + disk_num_bytes = 0; > > + else > > + disk_num_bytes -= cur_alloc_size; > > I don't get the logic here, why do we 'break' if disk_num_bytes > > cur_alloc_size?
I assume that you've run fstests against this patch, if so, I actually start worrying about that no fstests found this problem. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
