On thu, 24 Jan 2013 11:58:13 +0000, Hugo Mills wrote: > On Thu, Jan 24, 2013 at 07:39:17PM +0800, Miao Xie wrote: >> On thu, 24 Jan 2013 10:53:17 +0200, Alex Lyakas wrote: >>> Hi Chen, >>> with all due respect, what do you mean by "I see" and "OK for users"? >>> The semantics of the fallocate API with/without the >>> FALLOC_FL_PUNCH_HOLE flag is not defined by me or you. As far as I >>> understand, the file system *must* adhere to the semantics of this >>> API, as defined elsewhere. >>> Looking, for example, here: >>> http://man7.org/linux/man-pages/man2/fallocate.2.html >>> >>> "Allocating disk space >>> The default operation (i.e., mode is zero) of fallocate() allocates >>> and initializes to zero the disk space within the range specified by >>> offset and len." >>> >>> "Deallocating file space >>> Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux >>> 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte >>> range starting at offset and continuing for len bytes. Within the >>> specified range, partial file system blocks are zeroed, and whole file >>> system blocks are removed from the file." >>> >>> These are clearly two different modes of operation, and I don't think >>> you or me can decide otherwise, at this point. >> >> I think send/receive commands just need ensure the content of the file is >> right, not the disk format. That is also the reason why the developers just >> send out the zero data when they meet a hole or pre-allocated extent. From >> this viewpoint, they are the same. > > I disagree on the "content vs representation" comment above. I > would very much hope that the reference receive implementation can > turn a string of zeroes (or a hole) back into a sparse file. As a > user, I'd be quite irritated if, say, one of my sparse VM images ended > up 5 times the size when I backed it up with send/receive, simply > because it's gone from holey to a huge bunch of zeroes. That > particular issue can be dealt with at the receiver level, though.
My explanation is not clear, sorry! What I want to say is send command needn't differentiate between a hole and a pre-allocated extent, because both of them are considered as zero. So just send out a hole even we meet a pre-allocated extent is OK, I think. Thanks Miao > > Hugo. > >>> However, I may be not knowledgeable enough to confirm this. >>> Jan/Alexander, can you perhaps comment on this? >>> >>> Thanks, >>> Alex. >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Jan 24, 2013 at 3:54 AM, Chen Yang <[email protected]> >>> wrote: >>>> δΊ 2013-1-23 19:56, Alex Lyakas ει: >>>>> Hi, >>>>> >>>>> On Wed, Jan 23, 2013 at 1:04 PM, Chen Yang <[email protected]> >>>>> wrote: >>>>>> From: Chen Yang <[email protected]> >>>>>> Date: Wed, 23 Jan 2013 11:21:51 +0800 >>>>>> Subject: [PATCH] Btrfs/send: sparse and pre-allocated file support for >>>>>> btrfs-send mechanism >>>>>> >>>>>> When sending a file with sparse or pre-allocated part, >>>>>> these parts will be sent as ZERO streams, and it's unnecessary. >>>>>> >>>>>> There are two ways to improve this, one is just skip the EMPTY parts, >>>>>> and the other one is to add a punch command to send, when an EMPTY parts >>>>>> was detected. But considering a case of incremental sends, if we choose >>>>>> the first one, when a hole got punched into the file after the initial >>>>>> send, the data will be unchanged on the receiving side when received >>>>>> incrementally. So the second choice is right. >>>>>> >>>>>> Signed-off-by: Cheng Yang <[email protected]> >>>>>> --- >>>>>> fs/btrfs/send.c | 60 >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++- >>>>>> fs/btrfs/send.h | 3 +- >>>>>> 2 files changed, 61 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>>>>> index 5445454..31e9aef 100644 >>>>>> --- a/fs/btrfs/send.c >>>>>> +++ b/fs/btrfs/send.c >>>>>> @@ -3585,6 +3585,52 @@ out: >>>>>> return ret; >>>>>> } >>>>>> >>>>>> +static int send_punch(struct send_ctx *sctx, u64 offset, u32 len) >>>>>> +{ >>>>>> + int ret = 0; >>>>>> + struct fs_path *p; >>>>>> + mm_segment_t old_fs; >>>>>> + >>>>>> + p = fs_path_alloc(sctx); >>>>>> + if (!p) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + /* >>>>>> + * vfs normally only accepts user space buffers for security >>>>>> reasons. >>>>>> + * we only read from the file and also only provide the read_buf >>>>>> buffer >>>>>> + * to vfs. As this buffer does not come from a user space call, >>>>>> it's >>>>>> + * ok to temporary allow kernel space buffers. >>>>>> + */ >>>>>> + old_fs = get_fs(); >>>>>> + set_fs(KERNEL_DS); >>>>>> + >>>>>> +verbose_printk("btrfs: send_fallocate offset=%llu, len=%d\n", offset, >>>>>> len); >>>>>> + >>>>>> + ret = open_cur_inode_file(sctx); >>>>>> + if (ret < 0) >>>>>> + goto out; >>>>>> + >>>>>> + ret = begin_cmd(sctx, BTRFS_SEND_C_PUNCH); >>>>>> + if (ret < 0) >>>>>> + goto out; >>>>>> + >>>>>> + ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); >>>>>> + if (ret < 0) >>>>>> + goto out; >>>>>> + >>>>>> + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); >>>>>> + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); >>>>>> + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); >>>>>> + >>>>>> + ret = send_cmd(sctx); >>>>>> + >>>>>> +tlv_put_failure: >>>>>> +out: >>>>>> + fs_path_free(sctx, p); >>>>>> + set_fs(old_fs); >>>>>> + return ret; >>>>>> +} >>>>>> + >>>>>> /* >>>>>> * Read some bytes from the current inode/file and send a write command >>>>>> to >>>>>> * user space. >>>>>> @@ -3718,6 +3764,7 @@ static int send_write_or_clone(struct send_ctx >>>>>> *sctx, >>>>>> u64 pos = 0; >>>>>> u64 len; >>>>>> u32 l; >>>>>> + u64 bytenr; >>>>>> u8 type; >>>>>> >>>>>> ei = btrfs_item_ptr(path->nodes[0], path->slots[0], >>>>>> @@ -3731,8 +3778,19 @@ static int send_write_or_clone(struct send_ctx >>>>>> *sctx, >>>>>> * sure to send the whole thing >>>>>> */ >>>>>> len = PAGE_CACHE_ALIGN(len); >>>>>> - } else { >>>>>> + } else if (type == BTRFS_FILE_EXTENT_REG) { >>>>>> len = btrfs_file_extent_num_bytes(path->nodes[0], ei); >>>>>> + bytenr = btrfs_file_extent_disk_bytenr(path->nodes[0], >>>>>> ei); >>>>>> + if (bytenr == 0) { >>>>>> + ret = send_punch(sctx, offset, len); >>>>>> + goto out; >>>>>> + } >>>>>> + } else if (type == BTRFS_FILE_EXTENT_PREALLOC) { >>>>>> + len = btrfs_file_extent_num_bytes(path->nodes[0], ei); >>>>>> + ret = send_punch(sctx, offset, len); >>>>>> + goto out; >>>>>> + } else { >>>>>> + BUG(); >>>>>> } >>>>> >>>>> Are these two cases really the same? In the bytenr == 0 we want to >>>>> deallocate the range. While in the prealloc case, we want to ensure >>>>> disk space allocation. Or am I mistaken? >>>>> Looking at the receive side, you use the same command for both: >>>>> ret = fallocate(r->write_fd, >>>>> FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, >>>>> offset, len); >>>>> >>>>> Looking at btrfs_fallocate code, in that case it will always punch the >>>>> hole: >>>>> static long btrfs_fallocate(struct file *file, int mode, >>>>> loff_t offset, loff_t len) >>>>> ... >>>>> if (mode & FALLOC_FL_PUNCH_HOLE) >>>>> return btrfs_punch_hole(inode, offset, len); >>>>> >>>>> So maybe you should have two different commands, or add a flag to >>>>> distinguish between the two cases? >>>>> >>>>> Thanks, >>>>> Alex. >>>>> >>>> >>>> I see sparse and pre-allocated parts both as EMPTY hole, >>>> and although different for a file, it's OK for users. >>>> >>>> Thanks >>>> >>>>> >>>>> >>>>>> >>>>>> if (offset + len > sctx->cur_inode_size) >>>>>> diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h >>>>>> index 1bf4f32..659ac8f 100644 >>>>>> --- a/fs/btrfs/send.h >>>>>> +++ b/fs/btrfs/send.h >>>>>> @@ -20,7 +20,7 @@ >>>>>> #include "ctree.h" >>>>>> >>>>>> #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream" >>>>>> -#define BTRFS_SEND_STREAM_VERSION 1 >>>>>> +#define BTRFS_SEND_STREAM_VERSION 2 >>>>>> >>>>>> #define BTRFS_SEND_BUF_SIZE (1024 * 64) >>>>>> #define BTRFS_SEND_READ_SIZE (1024 * 48) >>>>>> @@ -80,6 +80,7 @@ enum btrfs_send_cmd { >>>>>> BTRFS_SEND_C_WRITE, >>>>>> BTRFS_SEND_C_CLONE, >>>>>> >>>>>> + BTRFS_SEND_C_PUNCH, >>>>>> BTRFS_SEND_C_TRUNCATE, >>>>>> BTRFS_SEND_C_CHMOD, >>>>>> BTRFS_SEND_C_CHOWN, > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
