On 2018/8/28 下午1:54, Qu Wenruo wrote:
> Due to the limitation of btrfs_cross_ref_exist(), run_delalloc_nocow()
> can still fall back to CoW even only (unrelated) part of the
> preallocated extent is shared.
> 
> This makes the follow case to do unnecessary CoW:
> 
>  # xfs_io -f -c "falloc 0 2M" $mnt/file
>  # xfs_io -c "pwrite 0 1M" $mnt/file
>  # xfs_io -c "reflink $mnt/file 1M 4M 1M" $mnt/file
>  # sync
> 
> The pwrite will still be CoWed, since at writeback time, the
> preallocated extent is already shared, btrfs_cross_ref_exist() will
> return 1 and make run_delalloc_nocow() fall back to cow_file_range().
> 
> This is definitely an overkilling workaround, but this should be the
> simplest way without further screwing up already complex NOCOW routine.

Err, this is not even a working workaround.

It could still lead to bytes_may_use underflow as long as
btrfs_cross_ref_exist() could return 1 for partly shared prealloc extent.

So please ignore this patch.

Thanks,
Qu

> 
> Signed-off-by: Qu Wenruo <w...@suse.com>
> ---
>  fs/btrfs/ctree.h |  1 +
>  fs/btrfs/file.c  |  4 ++--
>  fs/btrfs/ioctl.c | 21 +++++++++++++++++++++
>  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 53af9f5253f4..ddacc41ff124 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3228,6 +3228,7 @@ int btrfs_add_inode_defrag(struct btrfs_trans_handle 
> *trans,
>                          struct btrfs_inode *inode);
>  int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info);
>  void btrfs_cleanup_defrag_inodes(struct btrfs_fs_info *fs_info);
> +int btrfs_start_ordered_ops(struct inode *inode, loff_t start, loff_t end);
>  int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int 
> datasync);
>  void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end,
>                            int skip_pinned);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 2be00e873e92..118bfd019c6c 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1999,7 +1999,7 @@ int btrfs_release_file(struct inode *inode, struct file 
> *filp)
>       return 0;
>  }
>  
> -static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
> +int btrfs_start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
>  {
>       int ret;
>       struct blk_plug plug;
> @@ -2056,7 +2056,7 @@ int btrfs_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>        * multi-task, and make the performance up.  See
>        * btrfs_wait_ordered_range for an explanation of the ASYNC check.
>        */
> -     ret = start_ordered_ops(inode, start, end);
> +     ret = btrfs_start_ordered_ops(inode, start, end);
>       if (ret)
>               goto out;
>  
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 63600dc2ac4c..866979f530bc 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4266,6 +4266,27 @@ static noinline int btrfs_clone_files(struct file 
> *file, struct file *file_src,
>                       goto out_unlock;
>       }
>  
> +     /*
> +      * btrfs_cross_ref_exist() only does check at extent level,
> +      * we could cause unexpected NOCOW write to be COWed.
> +      * E.g.:
> +      * falloc 0 2M file1
> +      * pwrite 0 1M file1 (at this point it should go NOCOW)
> +      * reflink src=file1 srcoff=1M dst=file1 dstoff=4M len=1M
> +      * sync
> +      *
> +      * In above case, due to the preallocated extent is shared
> +      * the data at 0~1M can't go NOCOW.
> +      *
> +      * So flush the whole src inode to avoid any unneeded CoW.
> +      */
> +     ret = btrfs_start_ordered_ops(src, 0, -1);
> +     if (ret < 0)
> +             goto out_unlock;
> +     ret = btrfs_wait_ordered_range(src, 0, -1);
> +     if (ret < 0)
> +             goto out_unlock;
> +
>       /*
>        * Lock the target range too. Right after we replace the file extent
>        * items in the fs tree (which now point to the cloned data), we might
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to