On Thu, Jan 24, 2013 at 08:13:34PM +0800, Miao Xie wrote:
> On    thu, 24 Jan 2013 11:58:13 +0000, Hugo Mills wrote:
> > On Thu, Jan 24, 2013 at 07:39:17PM +0800, Miao Xie wrote:
> >> On thu, 24 Jan 2013 10:53:17 +0200, Alex Lyakas wrote:
> >>> Hi Chen,
> >>> with all due respect, what do you mean by "I see" and "OK for users"?
> >>> The semantics of the fallocate API with/without the
> >>> FALLOC_FL_PUNCH_HOLE flag is not defined by me or you. As far as I
> >>> understand, the file system *must* adhere to the semantics of this
> >>> API, as defined elsewhere.
> >>> Looking, for example, here:
> >>> http://man7.org/linux/man-pages/man2/fallocate.2.html
> >>>
> >>> "Allocating disk space
> >>> The default operation (i.e., mode is zero) of fallocate() allocates
> >>> and initializes to zero the disk space within the range specified by
> >>> offset and len."
> >>>
> >>> "Deallocating file space
> >>> Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux
> >>> 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte
> >>> range starting at offset and continuing for len bytes.  Within the
> >>> specified range, partial file system blocks are zeroed, and whole file
> >>> system blocks are removed from the file."
> >>>
> >>> These are clearly two different modes of operation, and I don't think
> >>> you or me can decide otherwise, at this point.
> >>
> >> I think send/receive commands just need ensure the content of the file is
> >> right, not the disk format. That is also the reason why the developers just
> >> send out the zero data when they meet a hole or pre-allocated extent. From
> >> this viewpoint, they are the same.
> > 
> >    I disagree on the "content vs representation" comment above. I
> > would very much hope that the reference receive implementation can
> > turn a string of zeroes (or a hole) back into a sparse file. As a
> > user, I'd be quite irritated if, say, one of my sparse VM images ended
> > up 5 times the size when I backed it up with send/receive, simply
> > because it's gone from holey to a huge bunch of zeroes. That
> > particular issue can be dealt with at the receiver level, though.
> 
> My explanation is not clear, sorry! What I want to say is send command needn't
> differentiate between a hole and a pre-allocated extent, because both of them
> are considered as zero. So just send out a hole even we meet a pre-allocated
> extent is OK, I think.

   Ah, gotcha. That sounds more sensible, and I agree with your point
(for what little weight I carry -- I'm not as well-versed in the
behaviour of fallocate as I might be).

   Hugo.

> Thanks
> Miao 
> 
> > 
> >    Hugo.
> > 
> >>> However, I may be not knowledgeable enough to confirm this.
> >>> Jan/Alexander, can you perhaps comment on this?
> >>>
> >>> Thanks,
> >>> Alex.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Jan 24, 2013 at 3:54 AM, Chen Yang <[email protected]> 
> >>> wrote:
> >>>> 于 2013-1-23 19:56, Alex Lyakas 写道:
> >>>>> Hi,
> >>>>>
> >>>>> On Wed, Jan 23, 2013 at 1:04 PM, Chen Yang 
> >>>>> <[email protected]> wrote:
> >>>>>> From: Chen Yang <[email protected]>
> >>>>>> Date: Wed, 23 Jan 2013 11:21:51 +0800
> >>>>>> Subject: [PATCH] Btrfs/send: sparse and pre-allocated file support for
> >>>>>>  btrfs-send mechanism
> >>>>>>
> >>>>>> When sending a file with sparse or pre-allocated part,
> >>>>>> these parts will be sent as ZERO streams, and it's unnecessary.
> >>>>>>
> >>>>>> There are two ways to improve this, one is just skip the EMPTY parts,
> >>>>>> and the other one is to add a punch command to send, when an EMPTY 
> >>>>>> parts
> >>>>>> was detected. But considering a case of incremental sends, if we choose
> >>>>>> the first one, when a hole got punched into the file after the initial
> >>>>>> send, the data will be unchanged on the receiving side when received
> >>>>>> incrementally. So the second choice is right.
> >>>>>>
> >>>>>> Signed-off-by: Cheng Yang <[email protected]>
> >>>>>> ---
> >>>>>>  fs/btrfs/send.c |   60 
> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >>>>>>  fs/btrfs/send.h |    3 +-
> >>>>>>  2 files changed, 61 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> >>>>>> index 5445454..31e9aef 100644
> >>>>>> --- a/fs/btrfs/send.c
> >>>>>> +++ b/fs/btrfs/send.c
> >>>>>> @@ -3585,6 +3585,52 @@ out:
> >>>>>>         return ret;
> >>>>>>  }
> >>>>>>
> >>>>>> +static int send_punch(struct send_ctx *sctx, u64 offset, u32 len)
> >>>>>> +{
> >>>>>> +       int ret = 0;
> >>>>>> +       struct fs_path *p;
> >>>>>> +       mm_segment_t old_fs;
> >>>>>> +
> >>>>>> +       p = fs_path_alloc(sctx);
> >>>>>> +       if (!p)
> >>>>>> +               return -ENOMEM;
> >>>>>> +
> >>>>>> +       /*
> >>>>>> +        * vfs normally only accepts user space buffers for security 
> >>>>>> reasons.
> >>>>>> +        * we only read from the file and also only provide the 
> >>>>>> read_buf buffer
> >>>>>> +        * to vfs. As this buffer does not come from a user space 
> >>>>>> call, it's
> >>>>>> +        * ok to temporary allow kernel space buffers.
> >>>>>> +        */
> >>>>>> +       old_fs = get_fs();
> >>>>>> +       set_fs(KERNEL_DS);
> >>>>>> +
> >>>>>> +verbose_printk("btrfs: send_fallocate offset=%llu, len=%d\n", offset, 
> >>>>>> len);
> >>>>>> +
> >>>>>> +       ret = open_cur_inode_file(sctx);
> >>>>>> +       if (ret < 0)
> >>>>>> +               goto out;
> >>>>>> +
> >>>>>> +       ret = begin_cmd(sctx, BTRFS_SEND_C_PUNCH);
> >>>>>> +       if (ret < 0)
> >>>>>> +               goto out;
> >>>>>> +
> >>>>>> +       ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, 
> >>>>>> p);
> >>>>>> +       if (ret < 0)
> >>>>>> +               goto out;
> >>>>>> +
> >>>>>> +       TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
> >>>>>> +       TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
> >>>>>> +       TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len);
> >>>>>> +
> >>>>>> +       ret = send_cmd(sctx);
> >>>>>> +
> >>>>>> +tlv_put_failure:
> >>>>>> +out:
> >>>>>> +       fs_path_free(sctx, p);
> >>>>>> +       set_fs(old_fs);
> >>>>>> +       return ret;
> >>>>>> +}
> >>>>>> +
> >>>>>>  /*
> >>>>>>   * Read some bytes from the current inode/file and send a write 
> >>>>>> command to
> >>>>>>   * user space.
> >>>>>> @@ -3718,6 +3764,7 @@ static int send_write_or_clone(struct send_ctx 
> >>>>>> *sctx,
> >>>>>>         u64 pos = 0;
> >>>>>>         u64 len;
> >>>>>>         u32 l;
> >>>>>> +       u64 bytenr;
> >>>>>>         u8 type;
> >>>>>>
> >>>>>>         ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
> >>>>>> @@ -3731,8 +3778,19 @@ static int send_write_or_clone(struct send_ctx 
> >>>>>> *sctx,
> >>>>>>                  * sure to send the whole thing
> >>>>>>                  */
> >>>>>>                 len = PAGE_CACHE_ALIGN(len);
> >>>>>> -       } else {
> >>>>>> +       } else if (type == BTRFS_FILE_EXTENT_REG) {
> >>>>>>                 len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
> >>>>>> +               bytenr = btrfs_file_extent_disk_bytenr(path->nodes[0], 
> >>>>>> ei);
> >>>>>> +               if (bytenr == 0) {
> >>>>>> +                       ret = send_punch(sctx, offset, len);
> >>>>>> +                       goto out;
> >>>>>> +               }
> >>>>>> +       } else if (type == BTRFS_FILE_EXTENT_PREALLOC) {
> >>>>>> +               len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
> >>>>>> +               ret = send_punch(sctx, offset, len);
> >>>>>> +               goto out;
> >>>>>> +       } else {
> >>>>>> +               BUG();
> >>>>>>         }
> >>>>>
> >>>>> Are these two cases really the same? In the bytenr == 0 we want to
> >>>>> deallocate the range. While in the prealloc case, we want to ensure
> >>>>> disk space allocation. Or am I mistaken?
> >>>>> Looking at the receive side, you use the same command for both:
> >>>>> ret = fallocate(r->write_fd,
> >>>>>                FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> >>>>>                offset, len);
> >>>>>
> >>>>> Looking at btrfs_fallocate code, in that case it will always punch the 
> >>>>> hole:
> >>>>> static long btrfs_fallocate(struct file *file, int mode,
> >>>>>                           loff_t offset, loff_t len)
> >>>>> ...
> >>>>>       if (mode & FALLOC_FL_PUNCH_HOLE)
> >>>>>               return btrfs_punch_hole(inode, offset, len);
> >>>>>
> >>>>> So maybe you should have two different commands, or add a flag to
> >>>>> distinguish between the two cases?
> >>>>>
> >>>>> Thanks,
> >>>>> Alex.
> >>>>>
> >>>>
> >>>> I see sparse and pre-allocated parts both as EMPTY hole,
> >>>> and although different for a file, it's OK for users.
> >>>>
> >>>> Thanks
> >>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>         if (offset + len > sctx->cur_inode_size)
> >>>>>> diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
> >>>>>> index 1bf4f32..659ac8f 100644
> >>>>>> --- a/fs/btrfs/send.h
> >>>>>> +++ b/fs/btrfs/send.h
> >>>>>> @@ -20,7 +20,7 @@
> >>>>>>  #include "ctree.h"
> >>>>>>
> >>>>>>  #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
> >>>>>> -#define BTRFS_SEND_STREAM_VERSION 1
> >>>>>> +#define BTRFS_SEND_STREAM_VERSION 2
> >>>>>>
> >>>>>>  #define BTRFS_SEND_BUF_SIZE (1024 * 64)
> >>>>>>  #define BTRFS_SEND_READ_SIZE (1024 * 48)
> >>>>>> @@ -80,6 +80,7 @@ enum btrfs_send_cmd {
> >>>>>>         BTRFS_SEND_C_WRITE,
> >>>>>>         BTRFS_SEND_C_CLONE,
> >>>>>>
> >>>>>> +       BTRFS_SEND_C_PUNCH,
> >>>>>>         BTRFS_SEND_C_TRUNCATE,
> >>>>>>         BTRFS_SEND_C_CHMOD,
> >>>>>>         BTRFS_SEND_C_CHOWN,
> > 
> 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- UNIX: British manufacturer of modular shelving units. ---      

Attachment: signature.asc
Description: Digital signature

Reply via email to