On 2018/8/28 下午1:54, Qu Wenruo wrote: > Due to the limitation of btrfs_cross_ref_exist(), run_delalloc_nocow() > can still fall back to CoW even only (unrelated) part of the > preallocated extent is shared. > > This makes the follow case to do unnecessary CoW: > > # xfs_io -f -c "falloc 0 2M" $mnt/file > # xfs_io -c "pwrite 0 1M" $mnt/file > # xfs_io -c "reflink $mnt/file 1M 4M 1M" $mnt/file > # sync > > The pwrite will still be CoWed, since at writeback time, the > preallocated extent is already shared, btrfs_cross_ref_exist() will > return 1 and make run_delalloc_nocow() fall back to cow_file_range(). > > This is definitely an overkilling workaround, but this should be the > simplest way without further screwing up already complex NOCOW routine.
Err, this is not even a working workaround. It could still lead to bytes_may_use underflow as long as btrfs_cross_ref_exist() could return 1 for partly shared prealloc extent. So please ignore this patch. Thanks, Qu > > Signed-off-by: Qu Wenruo <w...@suse.com> > --- > fs/btrfs/ctree.h | 1 + > fs/btrfs/file.c | 4 ++-- > fs/btrfs/ioctl.c | 21 +++++++++++++++++++++ > 3 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 53af9f5253f4..ddacc41ff124 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -3228,6 +3228,7 @@ int btrfs_add_inode_defrag(struct btrfs_trans_handle > *trans, > struct btrfs_inode *inode); > int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info); > void btrfs_cleanup_defrag_inodes(struct btrfs_fs_info *fs_info); > +int btrfs_start_ordered_ops(struct inode *inode, loff_t start, loff_t end); > int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int > datasync); > void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end, > int skip_pinned); > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 2be00e873e92..118bfd019c6c 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1999,7 +1999,7 @@ int btrfs_release_file(struct inode *inode, struct file > *filp) > return 0; > } > > -static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end) > +int btrfs_start_ordered_ops(struct inode *inode, loff_t start, loff_t end) > { > int ret; > struct blk_plug plug; > @@ -2056,7 +2056,7 @@ int btrfs_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) > * multi-task, and make the performance up. See > * btrfs_wait_ordered_range for an explanation of the ASYNC check. > */ > - ret = start_ordered_ops(inode, start, end); > + ret = btrfs_start_ordered_ops(inode, start, end); > if (ret) > goto out; > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 63600dc2ac4c..866979f530bc 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -4266,6 +4266,27 @@ static noinline int btrfs_clone_files(struct file > *file, struct file *file_src, > goto out_unlock; > } > > + /* > + * btrfs_cross_ref_exist() only does check at extent level, > + * we could cause unexpected NOCOW write to be COWed. > + * E.g.: > + * falloc 0 2M file1 > + * pwrite 0 1M file1 (at this point it should go NOCOW) > + * reflink src=file1 srcoff=1M dst=file1 dstoff=4M len=1M > + * sync > + * > + * In above case, due to the preallocated extent is shared > + * the data at 0~1M can't go NOCOW. > + * > + * So flush the whole src inode to avoid any unneeded CoW. > + */ > + ret = btrfs_start_ordered_ops(src, 0, -1); > + if (ret < 0) > + goto out_unlock; > + ret = btrfs_wait_ordered_range(src, 0, -1); > + if (ret < 0) > + goto out_unlock; > + > /* > * Lock the target range too. Right after we replace the file extent > * items in the fs tree (which now point to the cloned data), we might >
signature.asc
Description: OpenPGP digital signature