btrfs-progs v3.14 mkfs.btrfs bug: --features long-option segfaults, -O short-option OK
Using btrfs-progs v3.14: mkfs.btrfs ... --features ... segfaults. mkfs.btrfs ... -O ... works fine. * I used other long options, so it's not simply a problem parsing long options. * I tried the long option with various features; none worked, including --features list-all with no other options. That too segfaulted, tho -O list-all worked. Segfault line as logged: mkfs.btrfs[5231]: segfault at 0 ip 7fb531c7ab1a sp 7fffaf1a7a78 error 4 in libc-2.19.so[7fb531bf8000+19a000] btrfs-progs v3.14 from git on kernel v3.14 from git, gentoo/~amd64, built with gcc-4.8.2, using glibc-2.19. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-progs v3.14 mkfs.btrfs bug: --features long-option segfaults, -O short-option OK
On Wed, 16 Apr 2014 06:41:44 +, Duncan wrote: Using btrfs-progs v3.14: mkfs.btrfs ... --features ... segfaults. Can reproduce (also with glibc 2.19 on Gentoo ;-) and building with debug found: (gdb) bt #0 0x76f3aaea in strlen () from /lib64/libc.so.6 #1 0x76f3a82e in strdup () from /lib64/libc.so.6 #2 0x004213e8 in main (ac=2, av=0x7fffe308) at mkfs.c:1312 Sure enough that line is handling 'O', where '--features' is supposed to be handled. No idea why -O works, but debugging shows that optarg is null at that point, so strdup goes poof. -h -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-progs v3.14 mkfs.btrfs bug: --features long-option segfaults, -O short-option OK
On Wed, 16 Apr 2014 07:48:53 +, Holger Hoffstätte wrote: On Wed, 16 Apr 2014 06:41:44 +, Duncan wrote: Using btrfs-progs v3.14: mkfs.btrfs ... --features ... segfaults. Can reproduce (also with glibc 2.19 on Gentoo ;-) and building with debug found: (gdb) bt #0 0x76f3aaea in strlen () from /lib64/libc.so.6 #1 0x76f3a82e in strdup () from /lib64/libc.so.6 #2 0x004213e8 in main (ac=2, av=0x7fffe308) at mkfs.c:1312 Sure enough that line is handling 'O', where '--features' is supposed to be handled. No idea why -O works, but debugging shows that optarg is null at that point, so strdup goes poof. Well, duh! The long_options array does not declare that --features wants an argument, so getopt does not create one. holgerdiff -u mkfs.c\~ mkfs.c --- mkfs.c~ 2014-04-16 09:41:42.0 +0200 +++ mkfs.c 2014-04-16 09:51:45.408854096 +0200 @@ -350,7 +350,7 @@ { version, 0, NULL, 'V' }, { rootdir, 1, NULL, 'r' }, { nodiscard, 0, NULL, 'K' }, - { features, 0, NULL, 'O' }, + { features, 1, NULL, 'O' }, { NULL, 0, NULL, 0} }; This fixes it. -h -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] btrfs: Add ctime/mtime update for btrfs device add/remove.
Btrfs will send uevent to udev inform the device change, but ctime/mtime for the block device inode is not udpated, which cause libblkid used by btrfs-progs unable to detect device change and use old cache, causing 'btrfs dev scan; btrfs dev rmove; btrfs dev scan' give an error message. Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com Cc: Karel Zak k...@redhat.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/volumes.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 49d7fab..ce232d7 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1452,6 +1452,22 @@ out: return ret; } +/* + * Function to update ctime/mtime for a given device path. + * Mainly used for ctime/mtime based probe like libblkid. + */ +static void update_dev_time(char *path_name) +{ + struct file *filp; + + filp = filp_open(path_name, O_RDWR, 0); + if (!filp) + return; + file_update_time(filp); + filp_close(filp, NULL); + return; +} + static int btrfs_rm_dev_item(struct btrfs_root *root, struct btrfs_device *device) { @@ -1704,10 +1720,14 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) ret = 0; - /* Notify udev that device has changed */ - if (bdev) + if (bdev) { + /* Notify udev that device has changed */ btrfs_kobject_uevent(bdev, KOBJ_CHANGE); + /* Update ctime/mtime for device path for libblkid */ + update_dev_time(device_path); + } + error_brelse: brelse(bh); if (bdev) @@ -2146,6 +2166,8 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) ret = btrfs_commit_transaction(trans, root); } + /* Update ctime/mtime for libblkid */ + update_dev_time(device_path); return ret; error_trans: -- 1.9.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: implement inode_operations callback tmpfile
On Tuesday 01 Apr 2014 11:53:19 PM Filipe David Borba Manana wrote: +static int btrfs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) +{ + struct btrfs_trans_handle *trans; + struct btrfs_root *root = BTRFS_I(dir)-root; + struct inode *inode = NULL; + u64 objectid; + u64 index; + int ret = 0; + + /* + * 2 for inode item and ref + * 2 for dir items + * 1 for xattr if selinux is on + */ + trans = btrfs_start_transaction(root, 5); + if (IS_ERR(trans)) + return PTR_ERR(trans); + Hello, Any particular reason to reserve space for 5 items? For the O_TMPFILE case we seem to allocate and use just the one inode item and none of the associated 'inode ref', 'dir item' and 'dir index item' since there is no directory entry associated with the file. I am not sure about the xattr item though. Thanks, chandan. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: implement inode_operations callback tmpfile
On Wed, Apr 16, 2014 at 10:00 AM, Chandan Rajendra chan...@linux.vnet.ibm.com wrote: On Tuesday 01 Apr 2014 11:53:19 PM Filipe David Borba Manana wrote: +static int btrfs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) +{ + struct btrfs_trans_handle *trans; + struct btrfs_root *root = BTRFS_I(dir)-root; + struct inode *inode = NULL; + u64 objectid; + u64 index; + int ret = 0; + + /* + * 2 for inode item and ref + * 2 for dir items + * 1 for xattr if selinux is on + */ + trans = btrfs_start_transaction(root, 5); + if (IS_ERR(trans)) + return PTR_ERR(trans); + Hello, Any particular reason to reserve space for 5 items? For the O_TMPFILE case we seem to allocate and use just the one inode item and none of the associated 'inode ref', 'dir item' and 'dir index item' since there is no directory entry associated with the file. I am not sure about the xattr item though. Correct, not needed for directory entries. The xattr is needed for the case where an acl is inherited. And 5 units are required for orphan insertion (see comment on top of btrfs_orphan_add). I'll update the comment. Thanks Thanks, chandan. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: implement inode_operations callback tmpfile
On Wed, Apr 16, 2014 at 11:36:23AM +0100, Filipe David Manana wrote: The xattr is needed for the case where an acl is inherited. And 5 units are required for orphan insertion (see comment on top of btrfs_orphan_add). I'll update the comment. I don't think think a tmpfile should inherit any ACL, as it does not have a parent. Anyway, we're currently having a discussion on that on various lists including linux-fsdevel. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Copying a disk containing a btrfs filesystem
On Friday 11 April 2014 12:39:31 Brendan Hide wrote: If you're 100% happy with your old disk's *content*/layout/etc (just not happy with the disk's reliability), try an overnight/over-weekend ddrescue instead Thanks again to everyone who replied and especially for suggesting ddrescue. In the meantime, I've got a suitable new disk and copied the data to it using ddrescue (it took about 12 hours for 1TB). The disk had one fault extending over a few successive sectors. For some reason, I had to repair the first partition, which I use for swap. The large second with btrfs on it seems fine. I ran btrfs scrub on it and it found one error: Apr 16 09:48:27 fuchsia kernel: [ 6792.829186] btrfs: checksum error at logical 74443923456 on dev /dev/sdb2, sector 145398288, root 1228, inode 59102093, offset 929792, length 4096, links 1 (path: usr/lib/debug/.build-id/8f/b82df57b7b6fff7033f6abd7de914b82f98160.debug) That file is part of a historical snapshot which I have deleted by now and thus presumably dealt with the problem. Michael -- Michael Schuerig mailto:mich...@schuerig.de http://www.schuerig.de/michael/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: implement inode_operations callback tmpfile
On Wed, Apr 16, 2014 at 12:09 PM, Christoph Hellwig h...@infradead.org wrote: On Wed, Apr 16, 2014 at 11:36:23AM +0100, Filipe David Manana wrote: The xattr is needed for the case where an acl is inherited. And 5 units are required for orphan insertion (see comment on top of btrfs_orphan_add). I'll update the comment. I don't think think a tmpfile should inherit any ACL, as it does not have a parent. Anyway, we're currently having a discussion on that on various lists including linux-fsdevel. Interesting Christoph. I was following the ext4 implementation initially. So it seems the question is still open, and none of the following alternatives is decided yet (unless I missed something in the thread at fsdevel): 1) inherit acl from directory passed to the tmpfile handler; 2) inherit acl at link time from directory we're linking to; 3) don't inherit acls Thanks for the heads up. -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: implement inode_operations callback tmpfile
On Wed, Apr 16, 2014 at 12:25:07PM +0100, Filipe David Manana wrote: Interesting Christoph. I was following the ext4 implementation initially. So it seems the question is still open, and none of the following alternatives is decided yet (unless I missed something in the thread at fsdevel): 1) inherit acl from directory passed to the tmpfile handler; 2) inherit acl at link time from directory we're linking to; 3) don't inherit acls Thanks for the heads up. I'll make sure we'll have a testcase in xfstests to cover whatever behavior we decide on to make sure all filesystems behave the same way. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How do I find the physical block number?
Hello, I have created a 500GB partition on my HDD and formatted it for btrfs. I created a file on it. # echo tmp data in the tmp file.. /mnt/btrfs/tmp-file # umount /mnt/btrfs Next I want to know the blocks allocated for the file and I used filefrag for it. I get some information as follows - # mount -o max_inline=0 /dev/sdc2 /mnt/btrfs # filefrag -v /mnt/btrfs/tmp-file Filesystem type is: 9123683e File size of /mnt/btrfs/tmp-file is 27 (1 block, blocksize 4096) ext logical physical expected length flags 0 0 65924123 1 eof /mnt/btrfs/tmp-file: 1 extent found Now, I want to read the same data from the disk directly. I tried the following - block 65924123 = byte (65924123*4096) = 270025207808 # dd if=/dev/sdc2 of=tmp-file skip=270025207808 bs=1 count=4096 # cat tmp-file I cannot read the file's contents but some garbage. I read somewhere that the physical block number shown in filefrag may actually be a logical block for the file system and it has an additional translation to physical block number. So next I tried the following - # btrfs-map-logical -l 65924123 /dev/sdc2 mirror 1 logical 65924123 physical 74312731 device /dev/sdc2 mirror 2 logical 65924123 physical 1148054555 device /dev/sdc2 I again tried reading the block 74312731 using the dd command as above, but it is still not the right block. I want to know what does the physical block number returned by filefrag mean, why there are two mappings for the above block number and how I can find the exact physical disk block number the file system actually writes to. My sdc has the following partitions: Device Boot Start End Blocks Id System /dev/sdc12048 419432447 209715200 83 Linux /dev/sdc2 1468008448 2516584447 524288000 83 Linux (BTRFS) /dev/sdc3 419432448 1468008447 524288000 83 Linux Thanks, Aastha. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: protect snapshots from deleting during send
On Tue, Apr 15, 2014 at 01:21:46PM -0400, Chris Mason wrote: On 04/15/2014 12:27 PM, David Sterba wrote: On Tue, Apr 15, 2014 at 12:00:49PM -0400, Chris Mason wrote: I'm worried about the use case where we have: * periodic automated snapshots * periodic automated deletion of old snapshots * periodic send for backup The automated deletion doesn't want to error out if send is in progress, it just wants the deletion to happen in the background. I'd give the precedence to the 'backup' process before the 'clean old snapshots', because it can do more harm if the snapshot is removed meanwhile without any possibility to recover. Right, we don't want either process to stop with an error. We just want them to continue happily and do the right thing... ... if everything goes without errors. Not like send going out of memory, send through network has a glitch, send to a file runs out of space, and has to be restarted. Is this too unrealistic to happen? It's a good point, a better way to say what I have in mind is that we shouldn't be adding new transient errors to the send process (on purpose ;) Ok, I got a wrong understanding from your previous reply. So the next thing in the list to try is to add tunables affecting delete vs send. Something like $ btrfs send --protect-subvols -c clone1 -c clone2 source that would disallow to delete clone1, clone2 and source, passed to the ioctl as a flag and internally adding another refcount for 'how many times it has been protected'. Sounds ugly, but would cover all possible combinations of sending with or without deletion protection. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: protect snapshots from deleting during send
On 04/16/2014 09:32 AM, David Sterba wrote: On Tue, Apr 15, 2014 at 01:21:46PM -0400, Chris Mason wrote: On 04/15/2014 12:27 PM, David Sterba wrote: On Tue, Apr 15, 2014 at 12:00:49PM -0400, Chris Mason wrote: I'm worried about the use case where we have: * periodic automated snapshots * periodic automated deletion of old snapshots * periodic send for backup The automated deletion doesn't want to error out if send is in progress, it just wants the deletion to happen in the background. I'd give the precedence to the 'backup' process before the 'clean old snapshots', because it can do more harm if the snapshot is removed meanwhile without any possibility to recover. Right, we don't want either process to stop with an error. We just want them to continue happily and do the right thing... ... if everything goes without errors. Not like send going out of memory, send through network has a glitch, send to a file runs out of space, and has to be restarted. Is this too unrealistic to happen? It's a good point, a better way to say what I have in mind is that we shouldn't be adding new transient errors to the send process (on purpose ;) Ok, I got a wrong understanding from your previous reply. So the next thing in the list to try is to add tunables affecting delete vs send. Something like $ btrfs send --protect-subvols -c clone1 -c clone2 source that would disallow to delete clone1, clone2 and source, passed to the ioctl as a flag and internally adding another refcount for 'how many times it has been protected'. Sounds ugly, but would cover all possible combinations of sending with or without deletion protection. Ok, I reread the patch and your original point about dealing with a send + delete + network interruption. That's the part I didn't catch the first time around. So in my example with the automated tool, the tool really shouldn't be deleting a snapshot where send is in progress. The tool should be told that snapshot is busy and try to delete it again later. It makes more sense now, 'll queue this up for 3.16 and we can try it out in -next. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4 v2] Btrfs: send, bump stream version
This increases the send stream version from version 1 to version 2, adding 2 new commands: 1) total data size - used to tell the receiver how much file data the stream will add or update; 2) fallocate - used to pre-allocate space for files and to punch holes in files. This is preparation work for subsequent changes that implement the new features (computing total data size and use fallocate for better performance). A version 2 stream is only produced if the send ioctl caller passes in one of the new flags (BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | BTRFS_SEND_FLAG_SUPPORT_FALLOCATE), meaning old clients are unaffected. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: A v2 stream is now only produced if the send ioctl caller passes in one of the new flags (BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) to avoid breaking old clients. fs/btrfs/send.c| 6 +- fs/btrfs/send.h| 14 +- include/uapi/linux/btrfs.h | 24 +++- 3 files changed, 41 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 289e9f3..53712aa 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -632,7 +632,11 @@ static int send_header(struct send_ctx *sctx) struct btrfs_stream_header hdr; strcpy(hdr.magic, BTRFS_SEND_STREAM_MAGIC); - hdr.version = cpu_to_le32(BTRFS_SEND_STREAM_VERSION); + if (sctx-flags (BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | + BTRFS_SEND_FLAG_SUPPORT_FALLOCATE)) + hdr.version = cpu_to_le32(BTRFS_SEND_STREAM_VERSION_2); + else + hdr.version = cpu_to_le32(BTRFS_SEND_STREAM_VERSION_1); return write_buf(sctx-send_filp, hdr, sizeof(hdr), sctx-send_off); diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h index 48d425a..367030d 100644 --- a/fs/btrfs/send.h +++ b/fs/btrfs/send.h @@ -20,7 +20,8 @@ #include ctree.h #define BTRFS_SEND_STREAM_MAGIC btrfs-stream -#define BTRFS_SEND_STREAM_VERSION 1 +#define BTRFS_SEND_STREAM_VERSION_1 1 +#define BTRFS_SEND_STREAM_VERSION_2 2 #define BTRFS_SEND_BUF_SIZE (1024 * 64) #define BTRFS_SEND_READ_SIZE (1024 * 48) @@ -87,6 +88,11 @@ enum btrfs_send_cmd { BTRFS_SEND_C_END, BTRFS_SEND_C_UPDATE_EXTENT, + + /* added in stream version 2 */ + BTRFS_SEND_C_TOTAL_DATA_SIZE, + BTRFS_SEND_C_FALLOCATE, + __BTRFS_SEND_C_MAX, }; #define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1) @@ -125,10 +131,16 @@ enum { BTRFS_SEND_A_CLONE_OFFSET, BTRFS_SEND_A_CLONE_LEN, + /* added in stream version 2 */ + BTRFS_SEND_A_FALLOCATE_FLAGS, + __BTRFS_SEND_A_MAX, }; #define BTRFS_SEND_A_MAX (__BTRFS_SEND_A_MAX - 1) +#define BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE (1 0) +#define BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE (1 1) + #ifdef __KERNEL__ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg); #endif diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index b4d6909..6611406 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -464,10 +464,32 @@ struct btrfs_ioctl_received_subvol_args { */ #define BTRFS_SEND_FLAG_OMIT_END_CMD 0x4 +/* + * Calculate the amount (in bytes) of new file data between the send and + * parent snapshots, or in case of a full send, the total amount of file data + * we will send. + * This corresponds to the sum of the data lengths of each write, clone and + * fallocate commands that are sent through the send stream. The receiving end + * can use this information to compute progress. + * + * Added in send stream version 2. + */ +#define BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE0x8 + +/* + * Use fallocate command to pre-allocate file extents and punch file holes, + * instead of write commands with data buffers filled with 0 value bytes. + * + * Added in send stream version 2. + */ +#define BTRFS_SEND_FLAG_SUPPORT_FALLOCATE 0x10 + #define BTRFS_SEND_FLAG_MASK \ (BTRFS_SEND_FLAG_NO_FILE_DATA | \ BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \ -BTRFS_SEND_FLAG_OMIT_END_CMD) +BTRFS_SEND_FLAG_OMIT_END_CMD | \ +BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | \ +BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) struct btrfs_ioctl_send_args { __s64 send_fd; /* in */ -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4 v2] Btrfs: send, use fallocate command to punch holes
Instead of sending a write command with a data buffer filled with 0 value bytes, use the fallocate command, introduced in the send stream version 2, to tell the receiver to punch a file hole using the fallocate system call. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: A v2 stream is now only produced if the send ioctl caller passes in one of the new flags (BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) to avoid breaking old clients. fs/btrfs/send.c | 56 +++- fs/btrfs/send.h | 4 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index f5db492..2c6d58c 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -564,6 +564,7 @@ static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len) return tlv_put(sctx, attr, __tmp, sizeof(__tmp)); \ } +TLV_PUT_DEFINE_INT(32) TLV_PUT_DEFINE_INT(64) static int tlv_put_string(struct send_ctx *sctx, u16 attr, @@ -4483,15 +4484,16 @@ out: return ret; } -static int send_hole(struct send_ctx *sctx, u64 end) +static int send_fallocate(struct send_ctx *sctx, u32 flags, + u64 offset, u64 len) { struct fs_path *p = NULL; - u64 offset = sctx-cur_inode_last_extent; - u64 len; int ret = 0; + ASSERT(sctx-flags BTRFS_SEND_FLAG_SUPPORT_FALLOCATE); + if (sctx-phase == SEND_PHASE_COMPUTE_DATA_SIZE) { - sctx-total_data_size += end - offset; + sctx-total_data_size += len; return 0; } @@ -4500,6 +4502,43 @@ static int send_hole(struct send_ctx *sctx, u64 end) return -ENOMEM; ret = get_cur_path(sctx, sctx-cur_ino, sctx-cur_inode_gen, p); if (ret 0) + goto out; + + ret = begin_cmd(sctx, BTRFS_SEND_C_FALLOCATE); + if (ret 0) + goto out; + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_U32(sctx, BTRFS_SEND_A_FALLOCATE_FLAGS, flags); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); + ret = send_cmd(sctx); + +tlv_put_failure: +out: + fs_path_free(p); + return ret; +} + +static int send_hole(struct send_ctx *sctx, u64 end) +{ + struct fs_path *p = NULL; + u64 offset = sctx-cur_inode_last_extent; + u64 len = end - offset; + int ret = 0; + + if (sctx-phase == SEND_PHASE_COMPUTE_DATA_SIZE) { + sctx-total_data_size += len; + return 0; + } + + if (sctx-flags BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) + return send_fallocate(sctx, + BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS, + offset, + len); + + ret = get_cur_path(sctx, sctx-cur_ino, sctx-cur_inode_gen, p); + if (ret 0) goto tlv_put_failure; memset(sctx-read_buf, 0, BTRFS_SEND_READ_SIZE); while (offset end) { @@ -4551,7 +4590,8 @@ static int send_write_or_clone(struct send_ctx *sctx, len = btrfs_file_extent_num_bytes(path-nodes[0], ei); } - if (offset + len sctx-cur_inode_size) + if (offset sctx-cur_inode_size + offset + len sctx-cur_inode_size) len = sctx-cur_inode_size - offset; if (len == 0) { ret = 0; @@ -4568,6 +4608,12 @@ static int send_write_or_clone(struct send_ctx *sctx, ret = send_clone(sctx, offset, len, clone_root); } else if (sctx-flags BTRFS_SEND_FLAG_NO_FILE_DATA) { ret = send_update_extent(sctx, offset, len); + } else if (btrfs_file_extent_disk_bytenr(path-nodes[0], ei) == 0 + type != BTRFS_FILE_EXTENT_INLINE + (sctx-flags BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) + offset sctx-cur_inode_size) { + ret = send_fallocate(sctx, BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS, +offset, len); } else { while (pos len) { l = len - pos; diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h index 367030d..a632c0d 100644 --- a/fs/btrfs/send.h +++ b/fs/btrfs/send.h @@ -141,6 +141,10 @@ enum { #define BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE (1 0) #define BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE (1 1) +#define BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS\ + (BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE | \ +BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE) + #ifdef __KERNEL__ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg); #endif -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4 v2] Btrfs: send, use fallocate command to allocate extents
The send stream version 2 adds the fallocate command, which can be used to allocate extents for a file or punch holes in a file. Previously we were ignoring file prealloc extents or treating them as extents filled with 0 bytes and sending a regular write command to the stream. After this change, together with my previous change titled: Btrfs: send, use fallocate command to punch holes an incremental send preserves the hole and data structure of files, which can be seen via calls to lseek with the whence parameter set to SEEK_DATA or SEEK_HOLE, as the example below shows: mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt xfs_io -f -c pwrite -S 0x01 -b 30 0 30 /mnt/foo btrfs subvolume snapshot -r /mnt /mnt/mysnap1 xfs_io -c fpunch 10 5 /mnt/foo xfs_io -c falloc 10 5 /mnt/foo xfs_io -c pwrite -S 0xff -b 1000 12 1000 /mnt/foo xfs_io -c fpunch 25 2 /mnt/foo # prealloc extents that start beyond the inode's size xfs_io -c falloc -k 30 100 /mnt/foo xfs_io -c falloc -k 900 200 /mnt/foo btrfs subvolume snapshot -r /mnt /mnt/mysnap2 btrfs send /mnt/mysnap1 -f /tmp/1.snap btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/2.snap mkfs.btrfs -f /dev/sdd mount /dev/sdd /mnt2 btrfs receive /mnt2 -f /tmp/1.snap btrfs receive /mnt2 -f /tmp/2.snap Before this change the hole/data structure differed between both filesystems: $ xfs_io -r -c 'seek -r -a 0' /mnt/mysnap2/foo Whence Result DATA0 HOLE102400 DATA118784 HOLE122880 DATA147456 HOLE253952 DATA266240 HOLE30 $ xfs_io -r -c 'seek -r -a 0' /mnt2/mysnap2/foo Whence Result DATA0 HOLE30 After this change the second filesystem (/dev/sdd) ends up with the same hole/data structure as the first filesystem. Also, after this change, prealloc extents that lie beyond the inode's size (were allocated with fallocate + keep size flag) are also replicated by an incremental send. For the above test, it can be observed via fiemap (or btrfs-debug-tree): $ xfs_io -r -c 'fiemap -l' /mnt2/mysnap2/foo 0: [0..191]: 25096..25287 192 blocks 1: [192..199]: 24672..24679 8 blocks 2: [200..231]: 24584..24615 32 blocks 3: [232..239]: 24680..24687 8 blocks 4: [240..287]: 24616..24663 48 blocks 5: [288..295]: 24688..24695 8 blocks 6: [296..487]: 25392..25583 192 blocks 7: [488..495]: 24696..24703 8 blocks 8: [496..519]: hole 24 blocks 9: [520..527]: 24704..24711 8 blocks 10: [528..583]: 25624..25679 56 blocks 11: [584..591]: 24712..24719 8 blocks 12: [592..2543]: 26192..28143 1952 blocks 13: [2544..17575]: hole 15032 blocks 14: [17576..21487]: 28144..32055 3912 blocks A test case for xfstests will follow. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: A v2 stream is now only produced if the send ioctl caller passes in one of the new flags (BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) to avoid breaking old clients. fs/btrfs/send.c | 70 +++-- 1 file changed, 48 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 2c6d58c..043fd43 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -113,9 +113,10 @@ struct send_ctx { */ u64 cur_ino; u64 cur_inode_gen; - int cur_inode_new; - int cur_inode_new_gen; - int cur_inode_deleted; + u8 cur_inode_new:1; + u8 cur_inode_new_gen:1; + u8 cur_inode_skip_truncate:1; + u8 cur_inode_deleted:1; u64 cur_inode_size; u64 cur_inode_mode; u64 cur_inode_rdev; @@ -4599,8 +4600,7 @@ static int send_write_or_clone(struct send_ctx *sctx, } if (sctx-phase == SEND_PHASE_COMPUTE_DATA_SIZE) { - if (offset sctx-cur_inode_size) - sctx-total_data_size += len; + sctx-total_data_size += len; goto out; } @@ -4614,6 +4614,27 @@ static int send_write_or_clone(struct send_ctx *sctx, offset sctx-cur_inode_size) { ret = send_fallocate(sctx, BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS, offset, len); + } else if (type == BTRFS_FILE_EXTENT_PREALLOC + (sctx-flags BTRFS_SEND_FLAG_SUPPORT_FALLOCATE)) { + u32 mode = 0; + if (offset sctx-cur_inode_size) { + ret = send_fallocate(sctx, +BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS, +offset, len); + if (ret) + goto out; + } else { + if (!sctx-cur_inode_skip_truncate) { +
[PATCH 1/4 v2] Btrfs-progs: send, bump stream version
This increases the send stream version from version 1 to version 2, adding 2 new commands: 1) total data size - used to tell the receiver how much file data the stream will add or update; 2) fallocate - used to pre-allocate space for files and to punch holes in files. This is preparation work for subsequent changes that implement the new features (computing total data size and use fallocate for better performance). This doesn't break compatibility with older kernels or clients. In order to get a version 2 send stream, new flags must be passed to the send ioctl. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Added new send ioctl flag BTRFS_SEND_FLAG_SUPPORT_FALLOCATE. A version 2 stream is now only produced is the ioctl caller specifies at least one of the new send flags (BTRFS_SEND_FLAG_SUPPORT_FALLOCATE or BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE). ioctl.h | 18 ++ send.h | 13 - 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/ioctl.h b/ioctl.h index 231660a..e2c506b 100644 --- a/ioctl.h +++ b/ioctl.h @@ -392,6 +392,24 @@ struct btrfs_ioctl_received_subvol_args { */ #define BTRFS_SEND_FLAG_OMIT_END_CMD 0x4 +/* + * The sum of all length fields the receiver will get in write, clone and + * fallocate commands. + * This can be used by the receiver to compute progress, at the expense of some + * initial metadata scan performed by the sender (kernel). + * + * Added in send stream version 2. + */ +#define BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE0x8 + +/* + * Use fallocate command to pre-allocate file extents and punch file holes, + * instead of write commands with data buffers filled with 0 value bytes. + * + * Added in send stream version 2. + */ +#define BTRFS_SEND_FLAG_SUPPORT_FALLOCATE 0x10 + struct btrfs_ioctl_send_args { __s64 send_fd; /* in */ __u64 clone_sources_count; /* in */ diff --git a/send.h b/send.h index e8da785..69e81fb 100644 --- a/send.h +++ b/send.h @@ -24,7 +24,7 @@ extern C { #endif #define BTRFS_SEND_STREAM_MAGIC btrfs-stream -#define BTRFS_SEND_STREAM_VERSION 1 +#define BTRFS_SEND_STREAM_VERSION 2 #define BTRFS_SEND_BUF_SIZE (1024 * 64) #define BTRFS_SEND_READ_SIZE (1024 * 48) @@ -91,6 +91,11 @@ enum btrfs_send_cmd { BTRFS_SEND_C_END, BTRFS_SEND_C_UPDATE_EXTENT, + + /* added in stream version 2 */ + BTRFS_SEND_C_TOTAL_DATA_SIZE, + BTRFS_SEND_C_FALLOCATE, + __BTRFS_SEND_C_MAX, }; #define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1) @@ -129,10 +134,16 @@ enum { BTRFS_SEND_A_CLONE_OFFSET, BTRFS_SEND_A_CLONE_LEN, + /* added in stream version 2 */ + BTRFS_SEND_A_FALLOCATE_FLAGS, + __BTRFS_SEND_A_MAX, }; #define BTRFS_SEND_A_MAX (__BTRFS_SEND_A_MAX - 1) +#define BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE (1 0) +#define BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE (1 1) + #ifdef __KERNEL__ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg); #endif -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4 v2] Btrfs-progs: send, implement total data size callback and progress report
This is a followup to the kernel patch titled: Btrfs: send, implement total data size command to allow for progress estimation This makes the btrfs send and receive commands aware of the new send flag, named BTRFS_SEND_C_TOTAL_DATA_SIZE, which tells us the amount of file data that is new between the parent and send snapshots/roots. As this command immediately follows the commands to start a snapshot/subvolume, it can be used to report and compute progress, by keeping a counter that is incremented with the data length of each write, clone and fallocate command that is received from the stream. Example: $ btrfs send -o /mnt/sdd/snap_base | btrfs receive /mnt/sdc At subvol /mnt/sdd/snap_base At subvol snap_base About to receive 9212392667 bytes Subvolume /mnt/sdc//snap_base, 4059722426 / 9212392667 bytes received, 44.07%, 40.32MB/s $ btrfs send -o -p /mnt/sdd/snap_base /mnt/sdd/snap_incr | btrfs receive /mnt/sdc At subvol /mnt/sdd/snap_incr At subvol snap_incr About to receive 9571342213 bytes Subvolume /mnt/sdc//snap_incr, 6557345221 / 9571342213 bytes received, 68.51%, 51.04MB/s At the moment progress is only reported by btrfs-receive, but it is possible and simple to do it for btrfs-send too, so that we can get progress report when not piping btrfs-send output to btrfs-receive (directly to a file). Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Added new send ioctl flag BTRFS_SEND_FLAG_SUPPORT_FALLOCATE. A version 2 stream is now only produced is the ioctl caller specifies at least one of the new send flags (BTRFS_SEND_FLAG_SUPPORT_FALLOCATE or BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE). Documentation/btrfs-send.txt | 3 ++ cmds-receive.c | 91 cmds-send.c | 14 ++- send-stream.c| 4 ++ send-stream.h| 1 + 5 files changed, 111 insertions(+), 2 deletions(-) diff --git a/Documentation/btrfs-send.txt b/Documentation/btrfs-send.txt index 18a98fa..38470b0 100644 --- a/Documentation/btrfs-send.txt +++ b/Documentation/btrfs-send.txt @@ -40,6 +40,9 @@ Use this snapshot as a clone source for an incremental send (multiple allowed). -f outfile:: Output is normally written to stdout. To write to a file, use this option. An alternative would be to use pipes. +-o:: +Obtain the total data size for each subvolume or snapshot to send. This demands additional +processing (mostly IO bound) but is useful for the receive command to report progress. EXIT STATUS --- diff --git a/cmds-receive.c b/cmds-receive.c index d6cd3da..19300fc 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -32,6 +32,7 @@ #include ftw.h #include wait.h #include assert.h +#include time.h #include sys/stat.h #include sys/types.h @@ -71,6 +72,14 @@ struct btrfs_receive struct subvol_uuid_search sus; int honor_end_cmd; + + /* For the subvolume/snapshot we're currently receiving. */ + u64 total_data_size; + u64 bytes_received; + time_t last_progress_update; + u64 bytes_received_last_update; + float progress; + const char *target; }; static int finish_subvol(struct btrfs_receive *r) @@ -156,6 +165,12 @@ static int process_subvol(const char *path, const u8 *uuid, u64 ctransid, goto out; r-cur_subvol = calloc(1, sizeof(*r-cur_subvol)); + r-total_data_size = 0; + r-bytes_received = 0; + r-progress = 0.0; + r-last_progress_update = 0; + r-bytes_received_last_update = 0; + r-target = Subvolume; if (strlen(r-dest_dir_path) == 0) r-cur_subvol-path = strdup(path); @@ -205,6 +220,12 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, goto out; r-cur_subvol = calloc(1, sizeof(*r-cur_subvol)); + r-total_data_size = 0; + r-bytes_received = 0; + r-progress = 0.0; + r-last_progress_update = 0; + r-bytes_received_last_update = 0; + r-target = Snapshot; if (strlen(r-dest_dir_path) == 0) r-cur_subvol-path = strdup(path); @@ -287,6 +308,73 @@ out: return ret; } +static int process_total_data_size(u64 size, void *user) +{ + struct btrfs_receive *r = user; + + r-total_data_size = size; + fprintf(stdout, About to receive %llu bytes\n, size); + + return 0; +} + +static void update_progress(struct btrfs_receive *r, u64 bytes) +{ + float new_progress; + time_t now; + time_t tdiff; + + if (r-total_data_size == 0) + return; + + r-bytes_received += bytes; + + now = time(NULL); + tdiff = now - r-last_progress_update; + if (tdiff 1) { + if (r-bytes_received == r-total_data_size) + fprintf(stdout, \n); + return; + } + + new_progress =
[PATCH 3/4 v2] Btrfs-progs: send, implement fallocate command callback
The fallocate send stream command, added in stream version 2, is used to pre-allocate space for files and punch file holes. This change implements the callback for that new command, using the fallocate function from the standard C library to carry out the specified action (allocate file space or punch a file hole). Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Use the new send ioctl flag BTRFS_SEND_FLAG_SUPPORT_FALLOCATE if the user asks for it (-a command line option), which will make the kernel generate a version 2 send stream, so that old clients aren't affected. Documentation/btrfs-send.txt | 3 +++ cmds-receive.c | 38 ++ cmds-send.c | 12 ++-- send-stream.c| 13 + send-stream.h| 2 ++ 5 files changed, 66 insertions(+), 2 deletions(-) diff --git a/Documentation/btrfs-send.txt b/Documentation/btrfs-send.txt index 38470b0..e96be07 100644 --- a/Documentation/btrfs-send.txt +++ b/Documentation/btrfs-send.txt @@ -43,6 +43,9 @@ An alternative would be to use pipes. -o:: Obtain the total data size for each subvolume or snapshot to send. This demands additional processing (mostly IO bound) but is useful for the receive command to report progress. +-a:: +Use fallocate to pre-allocate file extents and to punch file holes, instead of writing zeroes +to files. EXIT STATUS --- diff --git a/cmds-receive.c b/cmds-receive.c index 19300fc..3f30066 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -41,6 +41,7 @@ #include sys/types.h #include sys/xattr.h #include uuid/uuid.h +#include linux/falloc.h #include ctree.h #include ioctl.h @@ -887,6 +888,42 @@ out: return ret; } +static int process_fallocate(const char *path, u32 flags, u64 offset, +u64 len, void *user) +{ + struct btrfs_receive *r = user; + char *full_path = path_cat(r-full_subvol_path, path); + int mode = 0; + int ret; + + if (flags BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE) + mode |= FALLOC_FL_KEEP_SIZE; + if (flags BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE) + mode |= FALLOC_FL_PUNCH_HOLE; + + if (g_verbose = 2) + fprintf(stderr, + fallocate %s - flags %u, offset %llu, len %llu\n, + path, flags, offset, len); + + ret = open_inode_for_write(r, full_path); + if (ret 0) + goto out; + + ret = fallocate(r-write_fd, mode, offset, len); + if (ret) { + ret = -errno; + fprintf(stderr, + ERROR: fallocate against %s failed. %s\n, + path, strerror(-ret)); + goto out; + } + update_progress(r, len); + +out: + free(full_path); + return ret; +} static struct btrfs_send_ops send_ops = { .subvol = process_subvol, @@ -910,6 +947,7 @@ static struct btrfs_send_ops send_ops = { .chown = process_chown, .utimes = process_utimes, .total_data_size = process_total_data_size, + .fallocate = process_fallocate, }; static int do_receive(struct btrfs_receive *r, const char *tomnt, int r_fd) diff --git a/cmds-send.c b/cmds-send.c index 69f5ba1..2a62e68 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -46,6 +46,7 @@ static int g_verbose = 0; static int g_total_data_size = 0; +static int g_fallocate = 0; struct btrfs_send { int send_fd; @@ -284,6 +285,8 @@ static int do_send(struct btrfs_send *send, u64 parent_root_id, io_send.flags |= BTRFS_SEND_FLAG_OMIT_END_CMD; if (g_total_data_size) io_send.flags |= BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE; + if (g_fallocate) + io_send.flags |= BTRFS_SEND_FLAG_SUPPORT_FALLOCATE; ret = ioctl(subvol_fd, BTRFS_IOC_SEND, io_send); if (ret) { ret = -errno; @@ -427,7 +430,7 @@ int cmd_send(int argc, char **argv) memset(send, 0, sizeof(send)); send.dump_fd = fileno(stdout); - while ((c = getopt(argc, argv, veoc:f:i:p:)) != -1) { + while ((c = getopt(argc, argv, veoac:f:i:p:)) != -1) { switch (c) { case 'v': g_verbose++; @@ -517,6 +520,9 @@ int cmd_send(int argc, char **argv) case 'o': g_total_data_size = 1; break; + case 'a': + g_fallocate = 1; + break; case '?': default: fprintf(stderr, ERROR: send args invalid.\n); @@ -679,7 +685,7 @@ out: } const char * const cmd_send_usage[] = { - btrfs send [-veo] [-p parent] [-c clone-src] [-f outfile] subvol [subvol...], + btrfs send [-veoa] [-p parent] [-c clone-src] [-f outfile] subvol [subvol...], Send
[PATCH 2/4 v2] Btrfs: send, implement total data size command to allow for progress estimation
This new send flag makes send calculate first the amount of new file data (in bytes) the send root has relatively to the parent root, or for the case of a non-incremental send, the total amount of file data the stream will create (including holes and prealloc extents). In other words, it computes the sum of the lengths of all write, clone and fallocate operations that will be sent through the send stream. This data size value is sent in a new command, named BTRFS_SEND_C_TOTAL_DATA_SIZE, that immediately follows a BTRFS_SEND_C_SUBVOL or BTRFS_SEND_C_SNAPSHOT command, and precedes any command that changes a file or the filesystem hierarchy. Upon receiving a write, clone or fallocate command, the receiving end can increment a counter by the data length of that command and therefore report progress by comparing the counter's value with the data size value received in the BTRFS_SEND_C_TOTAL_DATA_SIZE command. The approach is simple, before the normal operation of send, do a scan in the file system tree for new inodes and new/changed file extent items, just like in send's normal operation, and keep incrementing a counter with new inodes' size and the size of file extents (and file holes) that are going to be written, cloned or fallocated. This is actually a simpler and more lightweight tree scan/processing than the one we do when sending the changes, as it doesn't process inode references nor does any lookups in the extent tree for example. After modifying btrfs-progs to understand this new command and report progress, here's an example (the -o flag tells btrfs send to pass the new flag to the kernel's send ioctl): $ btrfs send -o /mnt/sdd/snap_base | btrfs receive /mnt/sdc At subvol /mnt/sdd/snap_base At subvol snap_base About to receive 9212392667 bytes Subvolume /mnt/sdc//snap_base, 4059722426 / 9212392667 bytes received, 44.07%, 40.32MB/s $ btrfs send -o -p /mnt/sdd/snap_base /mnt/sdd/snap_incr | btrfs receive /mnt/sdc At subvol /mnt/sdd/snap_incr At subvol snap_incr About to receive 9571342213 bytes Subvolume /mnt/sdc//snap_incr, 6557345221 / 9571342213 bytes received, 68.51%, 51.04MB/s Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: A v2 stream is now only produced if the send ioctl caller passes in one of the new flags (BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) to avoid breaking old clients. fs/btrfs/send.c | 194 ++-- 1 file changed, 162 insertions(+), 32 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 53712aa..f5db492 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -81,7 +81,13 @@ struct clone_root { #define SEND_CTX_MAX_NAME_CACHE_SIZE 128 #define SEND_CTX_NAME_CACHE_CLEAN_SIZE (SEND_CTX_MAX_NAME_CACHE_SIZE * 2) +enum btrfs_send_phase { + SEND_PHASE_STREAM_CHANGES, + SEND_PHASE_COMPUTE_DATA_SIZE, +}; + struct send_ctx { + enum btrfs_send_phase phase; struct file *send_filp; loff_t send_off; char *send_buf; @@ -116,6 +122,7 @@ struct send_ctx { u64 cur_inode_last_extent; u64 send_progress; + u64 total_data_size; struct list_head new_refs; struct list_head deleted_refs; @@ -692,6 +699,8 @@ static int send_rename(struct send_ctx *sctx, { int ret; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_rename %s - %s\n, from-start, to-start); ret = begin_cmd(sctx, BTRFS_SEND_C_RENAME); @@ -716,6 +725,8 @@ static int send_link(struct send_ctx *sctx, { int ret; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_link %s - %s\n, path-start, lnk-start); ret = begin_cmd(sctx, BTRFS_SEND_C_LINK); @@ -739,6 +750,8 @@ static int send_unlink(struct send_ctx *sctx, struct fs_path *path) { int ret; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_unlink %s\n, path-start); ret = begin_cmd(sctx, BTRFS_SEND_C_UNLINK); @@ -761,6 +774,8 @@ static int send_rmdir(struct send_ctx *sctx, struct fs_path *path) { int ret; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_rmdir %s\n, path-start); ret = begin_cmd(sctx, BTRFS_SEND_C_RMDIR); @@ -2308,6 +2323,9 @@ static int send_truncate(struct send_ctx *sctx, u64 ino, u64 gen, u64 size) int ret = 0; struct fs_path *p; + if (sctx-phase == SEND_PHASE_COMPUTE_DATA_SIZE) + return 0; + verbose_printk(btrfs: send_truncate %llu size=%llu\n, ino, size); p = fs_path_alloc(); @@ -2337,6 +2355,8 @@ static int send_chmod(struct send_ctx *sctx, u64 ino, u64 gen, u64 mode) int ret = 0; struct fs_path *p; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_chmod
[PATCH 4/4 v2] Btrfs-progs: add write and clone commands debug info to receive
When specifying -vv print information about received write and clone commands too, as we do this for other commands already and it's very useful for debugging and troubleshooting. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Added new send ioctl flag BTRFS_SEND_FLAG_SUPPORT_FALLOCATE. A version 2 stream is now only produced is the ioctl caller specifies at least one of the new send flags (BTRFS_SEND_FLAG_SUPPORT_FALLOCATE or BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE). cmds-receive.c | 9 + 1 file changed, 9 insertions(+) diff --git a/cmds-receive.c b/cmds-receive.c index 3f30066..7a23823 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -636,6 +636,10 @@ static int process_write(const char *path, const void *data, u64 offset, u64 pos = 0; int w; + if (g_verbose = 2) + fprintf(stderr, write %s, offset %llu, len %llu\n, + path, offset, len); + ret = open_inode_for_write(r, full_path); if (ret 0) goto out; @@ -672,6 +676,11 @@ static int process_clone(const char *path, u64 offset, u64 len, char *full_clone_path = NULL; int clone_fd = -1; + if (g_verbose = 2) + fprintf(stderr, + clone %s, offset %llu, len %llu, clone path %s, clone offset %llu\n, + path, offset, len, clone_path, clone_offset); + ret = open_inode_for_write(r, full_path); if (ret 0) goto out; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] xfstests: btrfs, test send's ability to punch holes and prealloc extents
This test verifies that after an incremental btrfs send the replicated file has the same exact hole and data structure as in the origin filesystem. This didn't use to be the case before the send stream version 2 - holes were sent as write operations of 0 valued bytes instead of punching holes with the fallocate system call, and pre-allocated extents were sent as well as write operations of 0 valued bytes instead of intructions for the receiver to use the fallocate system call. It also checks that prealloc extents that lie beyond the file's size are replicated by an incremental send. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Addressed Dave's comments, and updated btrfs send invocation, by specifying the new command line option (-a) that enables use of fallocate - added function _require_btrfs_send_fallocate_flag() to skip the test when an old version of btrfs-progs is found. common/rc | 9 tests/btrfs/047 | 121 tests/btrfs/047.out | 27 tests/btrfs/group | 1 + 4 files changed, 158 insertions(+) create mode 100755 tests/btrfs/047 create mode 100644 tests/btrfs/047.out diff --git a/common/rc b/common/rc index acf419b..e94e51c 100644 --- a/common/rc +++ b/common/rc @@ -2262,6 +2262,15 @@ _run_btrfs_util_prog() run_check $BTRFS_UTIL_PROG $* } +_require_btrfs_send_fallocate_flag() +{ + $BTRFS_UTIL_PROG send 21 | \ + grep '^[ \t]*\-a[ \t]\+.* fallocate ' /dev/null 21 + if [ $? -ne 0 ]; then + _notrun Missing btrfs-progs send -a command line option, skipped this test + fi +} + init_rc() { if [ $iam == new ] diff --git a/tests/btrfs/047 b/tests/btrfs/047 new file mode 100755 index 000..c8171a5 --- /dev/null +++ b/tests/btrfs/047 @@ -0,0 +1,121 @@ +#! /bin/bash +# FS QA Test No. btrfs/047 +# +# Verify that after an incremental btrfs send the replicated file has +# the same exact hole and data structure as in the origin filesystem. +# This didn't use to be the case before the send stream version 2 - +# holes were sent as write operations of 0 valued bytes instead of punching +# holes with the fallocate system call, and pre-allocated extents were sent +# as well as write operations of 0 valued bytes instead of intructions for +# the receiver to use the fallocate system call. Also check that prealloc +# extents that lie beyond the file's size are replicated by an incremental +# send. +# +# More specifically, this structure preserving guarantee was added by the +# following linux kernel commits: +# +#Btrfs: send, use fallocate command to punch holes +#Btrfs: send, use fallocate command to allocate extents +# +#--- +# Copyright (c) 2014 Filipe Manana. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ +rm -fr $send_files_dir +rm -fr $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/punch + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_fssum +_require_xfs_io_fiemap +_require_btrfs_send_fallocate_flag +_need_to_be_root + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs /dev/null 21 +_scratch_mount + +$XFS_IO_PROG -f -c pwrite -S 0x01 -b 30 0 30 $SCRATCH_MNT/foo \ + | _filter_xfs_io + +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 + +$XFS_IO_PROG -c fpunch 10 5 $SCRATCH_MNT/foo +$XFS_IO_PROG -c falloc 10 5 $SCRATCH_MNT/foo +$XFS_IO_PROG -c pwrite -S 0xff -b 1000 12 1000 $SCRATCH_MNT/foo \ + | _filter_xfs_io +$XFS_IO_PROG -c fpunch 25 2 $SCRATCH_MNT/foo + +$XFS_IO_PROG -c falloc -k 30 100 $SCRATCH_MNT/foo +$XFS_IO_PROG -c falloc -k 900 200 $SCRATCH_MNT/foo + +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 + +_run_btrfs_util_prog send -a $SCRATCH_MNT/mysnap1 -f
[PATCH v2] xfstests: btrfs, add test for btrfs properties
This test case verifies the btrfs properties feature, a new feature introduced in the linux kernel version 3.14. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Addressed Dave's comments, removed function to check for existence of the btrfs-progs property command and use instead existing function _require_btrfs which checks if a btrfs-progs command exists and is equivalent to what I had before. tests/btrfs/048 | 220 tests/btrfs/048.out | 78 +++ tests/btrfs/group | 1 + 3 files changed, 299 insertions(+) create mode 100755 tests/btrfs/048 create mode 100644 tests/btrfs/048.out diff --git a/tests/btrfs/048 b/tests/btrfs/048 new file mode 100755 index 000..e998f97 --- /dev/null +++ b/tests/btrfs/048 @@ -0,0 +1,220 @@ +#! /bin/bash +# FS QA Test No. btrfs/048 +# +# Btrfs properties test. The btrfs properties feature was introduced in the +# linux kernel 3.14. +# +#--- +# Copyright (c) 2014 Filipe Manana. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ +rm -fr $send_files_dir +rm -fr $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_btrfs property +_need_to_be_root + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs /dev/null 21 +_scratch_mount + +echo Testing label property +$BTRFS_UTIL_PROG property get $SCRATCH_MNT label +echo *** +$BTRFS_UTIL_PROG property set $SCRATCH_MNT label foobar +$BTRFS_UTIL_PROG property get $SCRATCH_MNT label +echo *** +$BTRFS_UTIL_PROG property get $SCRATCH_MNT +echo *** +$BTRFS_UTIL_PROG property set $SCRATCH_MNT label '' +$BTRFS_UTIL_PROG property get $SCRATCH_MNT label +echo *** +mkdir $SCRATCH_MNT/testdir +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir label +echo *** + +echo -e \nTesting subvolume ro property +_run_btrfs_util_prog subvolume create $SCRATCH_MNT/sv1 +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/sv1 ro +echo *** +$BTRFS_UTIL_PROG property set $SCRATCH_MNT/sv1 ro foo +echo *** +$BTRFS_UTIL_PROG property set $SCRATCH_MNT/sv1 ro true +echo *** +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/sv1 ro +echo *** +touch $SCRATCH_MNT/sv1/foobar 21 | _filter_scratch +echo *** +$BTRFS_UTIL_PROG property set $SCRATCH_MNT/sv1 ro false +touch $SCRATCH_MNT/sv1/foobar 21 | _filter_scratch +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/sv1 +echo *** + +echo -e \nTesting compression property +mkdir $SCRATCH_MNT/testdir/subdir1 +touch $SCRATCH_MNT/testdir/file1 +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir/file1 compression +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir/subdir1 compression +echo *** +$BTRFS_UTIL_PROG property set $SCRATCH_MNT/testdir/file1 compression \ + foo 21 | _filter_scratch +echo *** +$BTRFS_UTIL_PROG property set $SCRATCH_MNT/testdir/file1 compression lzo +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir/file1 compression + +# Verify property was persisted. +_scratch_unmount +_check_scratch_fs +_scratch_mount +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir/file1 compression +$BTRFS_UTIL_PROG property set $SCRATCH_MNT/testdir/file1 compression zlib +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir/file1 compression +$BTRFS_UTIL_PROG property set $SCRATCH_MNT/testdir/file1 compression '' +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir/file1 compression + +# Test compression property inheritance. +echo *** +$BTRFS_UTIL_PROG property set $SCRATCH_MNT/testdir/subdir1 compression lzo +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir/subdir1 compression +echo *** +mkdir $SCRATCH_MNT/testdir/subdir1/subsubdir +touch $SCRATCH_MNT/testdir/subdir1/some_file +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir/subdir1/subsubdir compression +echo *** +$BTRFS_UTIL_PROG property get $SCRATCH_MNT/testdir/subdir1/some_file compression +echo *** +mkdir
Re: [PATCH] xfstests: btrfs, test send's ability to punch holes and prealloc extents
On Wed, Apr 16, 2014 at 1:23 AM, Dave Chinner da...@fromorbit.com wrote: On Tue, Apr 15, 2014 at 05:43:21PM +0100, Filipe David Borba Manana wrote: This test verifies that after an incremental btrfs send the replicated file has the same exact hole and data structure as in the origin filesystem. This didn't use to be the case before the send stream version 2 - holes were sent as write operations of 0 valued bytes instead of punching holes with the fallocate system call, and pre-allocated extents were sent as well as write operations of 0 valued bytes instead of intructions for the receiver to use the fallocate system call. Also checks that prealloc extents that lie beyond the file's size are replicated by an incremental send. Can you wrap commit messages at 68 columns? +md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch +# List all hole and data segments. +$XFS_IO_PROG -r -c seek -r -a 0 $SCRATCH_MNT/mysnap2/foo +# List all extents, we're interested here in prealloc extents that lie beyond +# the file's size. +$XFS_IO_PROG -r -c fiemap -l $SCRATCH_MNT/mysnap2/foo | _filter_scratch That dumps raw block numbers into the golden output. _filter_fiemap is probably needed here. Hum, just tried it and uploaded a v2. However I'm now noticing it doesn't do everything I had in mind. _filter_fiemap is not showing the extents falloc -k created, only a collapsed range of holes. So my intention is to verify not just holes, but also the extents created by 'falloc -k'. The following filter I just made locally gives me that: _filter_all_fiemap() { awk --posix ' $3 ~ /hole/ { print $1, $2, $3; next; } $3 ~ /[[:xdigit:]]*..[[:xdigit:]]/ { print $1, $2, extent; next; }' } (nicely printed/indented at https://friendpaste.com/1JtG5bts2Sz0LWhUutCpzE, as e-mail is not good for code pasting) Which gives me: 0: [0..191]: extent 1: [192..199]: extent 2: [200..231]: extent 3: [232..239]: extent 4: [240..287]: extent 5: [288..295]: extent 6: [296..487]: extent 7: [488..495]: extent 8: [496..519]: hole 9: [520..527]: extent 10: [528..583]: extent 11: [584..591]: extent 12: [592..2543]: extent 13: [2544..17575]: hole 14: [17576..21487]: extent Versus only (from _filter_fiemap): 0: [496..17575]: hole I couldn't find any existing similar filter. Is it ok to add this new filter? If so, does this name makes sense and does it make sense to add it to common/punch file or some other file? Thanks Dave +md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch +# List all hole and data segments. +$XFS_IO_PROG -r -c seek -r -a 0 $SCRATCH_MNT/mysnap2/foo +# List all extents, we're interested here in prealloc extents that lie beyond +# the file's size. +$XFS_IO_PROG -r -c fiemap -l $SCRATCH_MNT/mysnap2/foo | _filter_scratch Same here. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebalance makes BTRFS 10x slower
Hi Ducan, But as I said, I'd sure like to get to the bottom of this one, since I do believe it has other potential implications in terms of bugs, etc. In theory, a balance should either not affect performance or should improve it, so getting to the bottom of why it's having such a bad performance impact for many really is something that needs to be done. I agree - getting to the bottom of this would be interesting. The question is how? Regards, Clemens -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lost /home subvolume after btrfs crash
Hello, I have a broken btrfs file system on a laptop. Debug material is available here: https://www.dropbox.com/sh/utv8b3qd0do6a04/zTwGQCrN9x Most importantly, the /home subvolume is lost. All attempts to recover data from it (btrfs-restore, mount -o recovery, btrfsck) have failed so far (/home is simply empty in the btrfs-restore output), although I was able to recover most of the other subvolumes and the main FS. The btrfs tools would typically segfault with failed assertions when analyzing the FS. Grepping through the entire volume shows that (at least parts of) /home are still available, but they seem to be disconnected from the main tree somehow, or the meta data is so corrupt that the tools bail out trying to find files under it. Making matters worse, the important data was encrypted using ecryptfs, so I need to recover the ciphertext first and then find a way to recover the plaintext (I do have the passphrase). The crash happened with a rather old OpenSUSE 12.2 kernel (3.4.11-2.16). The user says she was just surfing the web normally when the crash occured (no screenshot of the original crash, unfortunately). On the next boot, the btrfs root file system couldn't be mounted any more. After that I booted an OpenSUSE 12.3 rescue DVD and created the debug material shown above. Any hints how to retrieve files from the /home subvolume from this corrupt file system would be highly appreciated. Best regards Martin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: replace error code from btrfs_drop_extents
There's a case which clone does not handle and used to BUG_ON instead, (testcase xfstests/btrfs/035), now returns EINVAL. This error code is confusing to the ioctl caller, as it normally signifies errorneous arguments. Change it to EOPNOTSUPP which allows a fall back to copy instead of clone. This does not affect the common reflink operation. Signed-off-by: David Sterba dste...@suse.cz --- v2: fix error code typo in changelog, pointed out by Mitch fs/btrfs/file.c |6 +++--- fs/btrfs/ioctl.c |4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index eb742c07e7a4..bdd0b17150dd 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -800,7 +800,7 @@ next_slot: if (start key.offset end extent_end) { BUG_ON(del_nr 0); if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - ret = -EINVAL; + ret = -EOPNOTSUPP; break; } @@ -846,7 +846,7 @@ next_slot: */ if (start = key.offset end extent_end) { if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - ret = -EINVAL; + ret = -EOPNOTSUPP; break; } @@ -872,7 +872,7 @@ next_slot: if (start key.offset end = extent_end) { BUG_ON(del_nr 0); if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - ret = -EINVAL; + ret = -EOPNOTSUPP; break; } diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index e79ff6b90cb7..2ad7de94efef 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3066,7 +3066,7 @@ process_slot: new_key.offset + datal, 1); if (ret) { - if (ret != -EINVAL) + if (ret != -EOPNOTSUPP) btrfs_abort_transaction(trans, root, ret); btrfs_end_transaction(trans, root); @@ -3141,7 +3141,7 @@ process_slot: new_key.offset + datal, 1); if (ret) { - if (ret != -EINVAL) + if (ret != -EOPNOTSUPP) btrfs_abort_transaction(trans, root, ret); btrfs_end_transaction(trans, root); -- 1.7.9 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4 v3] Btrfs: send, use fallocate command to punch holes
Instead of sending a write command with a data buffer filled with 0 value bytes, use the fallocate command, introduced in the send stream version 2, to tell the receiver to punch a file hole using the fallocate system call. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: A v2 stream is now only produced if the send ioctl caller passes in one of the new flags (BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) to avoid breaking old clients. V3: Added missing path allocation, messed up rebase. fs/btrfs/send.c | 55 --- fs/btrfs/send.h | 4 2 files changed, 56 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index f5db492..bb9afea 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -564,6 +564,7 @@ static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len) return tlv_put(sctx, attr, __tmp, sizeof(__tmp)); \ } +TLV_PUT_DEFINE_INT(32) TLV_PUT_DEFINE_INT(64) static int tlv_put_string(struct send_ctx *sctx, u16 attr, @@ -4483,18 +4484,59 @@ out: return ret; } +static int send_fallocate(struct send_ctx *sctx, u32 flags, + u64 offset, u64 len) +{ + struct fs_path *p = NULL; + int ret = 0; + + ASSERT(sctx-flags BTRFS_SEND_FLAG_SUPPORT_FALLOCATE); + + if (sctx-phase == SEND_PHASE_COMPUTE_DATA_SIZE) { + sctx-total_data_size += len; + return 0; + } + + p = fs_path_alloc(); + if (!p) + return -ENOMEM; + ret = get_cur_path(sctx, sctx-cur_ino, sctx-cur_inode_gen, p); + if (ret 0) + goto out; + + ret = begin_cmd(sctx, BTRFS_SEND_C_FALLOCATE); + if (ret 0) + goto out; + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_U32(sctx, BTRFS_SEND_A_FALLOCATE_FLAGS, flags); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); + ret = send_cmd(sctx); + +tlv_put_failure: +out: + fs_path_free(p); + return ret; +} + static int send_hole(struct send_ctx *sctx, u64 end) { struct fs_path *p = NULL; u64 offset = sctx-cur_inode_last_extent; - u64 len; + u64 len = end - offset; int ret = 0; if (sctx-phase == SEND_PHASE_COMPUTE_DATA_SIZE) { - sctx-total_data_size += end - offset; + sctx-total_data_size += len; return 0; } + if (sctx-flags BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) + return send_fallocate(sctx, + BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS, + offset, + len); + p = fs_path_alloc(); if (!p) return -ENOMEM; @@ -4551,7 +4593,8 @@ static int send_write_or_clone(struct send_ctx *sctx, len = btrfs_file_extent_num_bytes(path-nodes[0], ei); } - if (offset + len sctx-cur_inode_size) + if (offset sctx-cur_inode_size + offset + len sctx-cur_inode_size) len = sctx-cur_inode_size - offset; if (len == 0) { ret = 0; @@ -4568,6 +4611,12 @@ static int send_write_or_clone(struct send_ctx *sctx, ret = send_clone(sctx, offset, len, clone_root); } else if (sctx-flags BTRFS_SEND_FLAG_NO_FILE_DATA) { ret = send_update_extent(sctx, offset, len); + } else if (btrfs_file_extent_disk_bytenr(path-nodes[0], ei) == 0 + type != BTRFS_FILE_EXTENT_INLINE + (sctx-flags BTRFS_SEND_FLAG_SUPPORT_FALLOCATE) + offset sctx-cur_inode_size) { + ret = send_fallocate(sctx, BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS, +offset, len); } else { while (pos len) { l = len - pos; diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h index 367030d..a632c0d 100644 --- a/fs/btrfs/send.h +++ b/fs/btrfs/send.h @@ -141,6 +141,10 @@ enum { #define BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE (1 0) #define BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE (1 1) +#define BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS\ + (BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE | \ +BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE) + #ifdef __KERNEL__ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg); #endif -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: protect snapshots from deleting during send
On 2014/04/16 03:40 PM, Chris Mason wrote: So in my example with the automated tool, the tool really shouldn't be deleting a snapshot where send is in progress. The tool should be told that snapshot is busy and try to delete it again later. It makes more sense now, 'll queue this up for 3.16 and we can try it out in -next. -chris So ... does this mean the plan is to a) have userland tool give an error; or b) a deletion would be scheduled in the background for as soon as the send has completed? -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: protect snapshots from deleting during send
On Wed, Apr 16, 2014 at 09:40:41AM -0400, Chris Mason wrote: It makes more sense now, 'll queue this up for 3.16 and we can try it out in -next. Thanks. 3.16 is fine for me. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: protect snapshots from deleting during send
On Wed, Apr 16, 2014 at 04:59:09PM +0200, Brendan Hide wrote: On 2014/04/16 03:40 PM, Chris Mason wrote: So in my example with the automated tool, the tool really shouldn't be deleting a snapshot where send is in progress. The tool should be told that snapshot is busy and try to delete it again later. It makes more sense now, 'll queue this up for 3.16 and we can try it out in -next. -chris So ... does this mean the plan is to a) have userland tool give an error; or b) a deletion would be scheduled in the background for as soon as the send has completed? b) is current state, a) is the plan with the patch, 'btrfs subvol delete' would return EPERM/EBUSY -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Lost /home subvolume after btrfs crash
On Apr 16, 2014, at 7:53 AM, Martin Wilck mwi...@arcor.de wrote: The crash happened with a rather old OpenSUSE 12.2 kernel (3.4.11-2.16). The user says she was just surfing the web normally when the crash occured (no screenshot of the original crash, unfortunately). On the next boot, the btrfs root file system couldn't be mounted any more. After that I booted an OpenSUSE 12.3 rescue DVD and created the debug material shown above. OpenSUSE 12.3 is using kernel 3.7 which is also old for this sort of recovery attempt. Even openSUSE 13.1 is at 3.11.6 which might work in a bind, but if it doesn't, inevitably someone will suggest you use something even newer. Current stable is 3.14.1, I suggest giving 3.13 or 3.14 a shot at this with -o ro,recovery as a first step and see if it at least mounts. And an old kernel implies old btrfs-progs too, which is where the code for btrfsck and btrfs restore is contained. So that needs to be at least v 3.12. And hopefully you didn't use --repair with btrfsck yet. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4 v3] Btrfs: send, use fallocate command to allocate extents
The send stream version 2 adds the fallocate command, which can be used to allocate extents for a file or punch holes in a file. Previously we were ignoring file prealloc extents or treating them as extents filled with 0 bytes and sending a regular write command to the stream. After this change, together with my previous change titled: Btrfs: send, use fallocate command to punch holes an incremental send preserves the hole and data structure of files, which can be seen via calls to lseek with the whence parameter set to SEEK_DATA or SEEK_HOLE, as the example below shows: mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt xfs_io -f -c pwrite -S 0x01 -b 30 0 30 /mnt/foo btrfs subvolume snapshot -r /mnt /mnt/mysnap1 xfs_io -c fpunch 10 5 /mnt/foo xfs_io -c falloc 10 5 /mnt/foo xfs_io -c pwrite -S 0xff -b 1000 12 1000 /mnt/foo xfs_io -c fpunch 25 2 /mnt/foo # prealloc extents that start beyond the inode's size xfs_io -c falloc -k 30 100 /mnt/foo xfs_io -c falloc -k 900 200 /mnt/foo btrfs subvolume snapshot -r /mnt /mnt/mysnap2 btrfs send /mnt/mysnap1 -f /tmp/1.snap btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/2.snap mkfs.btrfs -f /dev/sdd mount /dev/sdd /mnt2 btrfs receive /mnt2 -f /tmp/1.snap btrfs receive /mnt2 -f /tmp/2.snap Before this change the hole/data structure differed between both filesystems: $ xfs_io -r -c 'seek -r -a 0' /mnt/mysnap2/foo Whence Result DATA0 HOLE102400 DATA118784 HOLE122880 DATA147456 HOLE253952 DATA266240 HOLE30 $ xfs_io -r -c 'seek -r -a 0' /mnt2/mysnap2/foo Whence Result DATA0 HOLE30 After this change the second filesystem (/dev/sdd) ends up with the same hole/data structure as the first filesystem. Also, after this change, prealloc extents that lie beyond the inode's size (were allocated with fallocate + keep size flag) are also replicated by an incremental send. For the above test, it can be observed via fiemap (or btrfs-debug-tree): $ xfs_io -r -c 'fiemap -l' /mnt2/mysnap2/foo 0: [0..191]: 25096..25287 192 blocks 1: [192..199]: 24672..24679 8 blocks 2: [200..231]: 24584..24615 32 blocks 3: [232..239]: 24680..24687 8 blocks 4: [240..287]: 24616..24663 48 blocks 5: [288..295]: 24688..24695 8 blocks 6: [296..487]: 25392..25583 192 blocks 7: [488..495]: 24696..24703 8 blocks 8: [496..519]: hole 24 blocks 9: [520..527]: 24704..24711 8 blocks 10: [528..583]: 25624..25679 56 blocks 11: [584..591]: 24712..24719 8 blocks 12: [592..2543]: 26192..28143 1952 blocks 13: [2544..17575]: hole 15032 blocks 14: [17576..21487]: 28144..32055 3912 blocks A test case for xfstests will follow. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Added new send ioctl flag BTRFS_SEND_FLAG_SUPPORT_FALLOCATE. A version 2 stream is now only produced is the ioctl caller specifies at least one of the new send flags (BTRFS_SEND_FLAG_SUPPORT_FALLOCATE or BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE). V3: Fixed rebase, removed some duplicate logic on truncate + falloc -k. fs/btrfs/send.c | 78 + 1 file changed, 57 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index bb9afea..86b6a87 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -113,9 +113,10 @@ struct send_ctx { */ u64 cur_ino; u64 cur_inode_gen; - int cur_inode_new; - int cur_inode_new_gen; - int cur_inode_deleted; + u8 cur_inode_new:1; + u8 cur_inode_new_gen:1; + u8 cur_inode_skip_truncate:1; + u8 cur_inode_deleted:1; u64 cur_inode_size; u64 cur_inode_mode; u64 cur_inode_rdev; @@ -4563,6 +4564,19 @@ tlv_put_failure: return ret; } +static int truncate_before_falloc(struct send_ctx *sctx) +{ + int ret = 0; + + if (!sctx-cur_inode_skip_truncate) { + ret = send_truncate(sctx, sctx-cur_ino, + sctx-cur_inode_gen, + sctx-cur_inode_size); + sctx-cur_inode_skip_truncate = 1; + } + return ret; +} + static int send_write_or_clone(struct send_ctx *sctx, struct btrfs_path *path, struct btrfs_key *key, @@ -4602,8 +4616,7 @@ static int send_write_or_clone(struct send_ctx *sctx, } if (sctx-phase == SEND_PHASE_COMPUTE_DATA_SIZE) { - if (offset sctx-cur_inode_size) - sctx-total_data_size += len; + sctx-total_data_size += len; goto out; } @@ -4617,6 +4630,20 @@ static int send_write_or_clone(struct send_ctx *sctx,
Re: [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand.
On Wed, Apr 02, 2014 at 04:29:11PM +0800, Qu Wenruo wrote: Convert the old btrfs man pages to new asciidoc and split the huge btrfs man page into subcommand man page. I'm merging this patchset into the base series of integration because several patches need to update the docs and it's no longer feasible to keep it in a separate branch from the patches. btrfs-progs: Convert man page for btrfs-dedup. As dedup is not yet merged in kernel, the docs will not be part of the branch (but will be otherwise present in the integration). btrfs-progs: Convert man page for btrfsck I've skipped the patch adding btrfsck and linked to 'btrfs-check' instead, patch will follow. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: doc: link btrfsck to btrfs-check
The 'btrfsck' command has been deprecated in favor of 'btrfs check'. For compatibility install a symlink to the btrfs-check.8 manpage. CC: Qu Wenruo quwen...@cn.fujitsu.com Signed-off-by: David Sterba dste...@suse.cz --- Documentation/Makefile| 2 ++ Documentation/btrfs-check.txt | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/Makefile b/Documentation/Makefile index ec8598bb57d3..1eef9fd57da3 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -48,6 +48,7 @@ XMLTO_EXTRA = -m manpage-bold-literal.xsl GZIP = gzip INSTALL ?= install RM ?= rm -f +LNS ?= ln -sf BTRFS_VERSION = $(shell sed -n 's/.*BTRFS_BUILD_VERSION Btrfs \(.*\)/\1/p'\ ../version.h) @@ -73,6 +74,7 @@ install: install-man install-man: man $(INSTALL) -d -m 755 $(DESTDIR)$(man8dir) $(INSTALL) -m 644 $(GZ_MAN8) $(DESTDIR)$(man8dir) + $(LNS) btrfs-check.txt $(DESTDIR)$(man8dir) clean: $(RM) *.xml *.xml+ *.8 *.8.gz diff --git a/Documentation/btrfs-check.txt b/Documentation/btrfs-check.txt index ddd7fe77eca2..485a49cbc3ec 100644 --- a/Documentation/btrfs-check.txt +++ b/Documentation/btrfs-check.txt @@ -18,6 +18,8 @@ command, it is *highly* recommended to read the following btrfs wiki before executing 'btrfs check' with '--repair' option: + https://btrfs.wiki.kernel.org/index.php/Btrfsck +'btrfsck' is an alias of 'btrfs check' command and is now deprecated. + OPTIONS --- -s|--support superblock:: @@ -47,4 +49,3 @@ SEE ALSO `mkfs.btrfs`(8), `btrfs-scrub`(8), `btrfs-rescue`(8) -`btrfsck`(8) -- 1.9.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4 v2] Btrfs-progs: send, implement total data size callback and progress report
On Wed, Apr 16, 2014 at 03:56:15PM +0100, Filipe David Borba Manana wrote: V2: Added new send ioctl flag BTRFS_SEND_FLAG_SUPPORT_FALLOCATE. A version 2 stream is now only produced is the ioctl caller specifies at least one of the new send flags (BTRFS_SEND_FLAG_SUPPORT_FALLOCATE or BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE). Good. @@ -156,6 +165,12 @@ static int process_subvol(const char *path, const u8 *uuid, u64 ctransid, goto out; r-cur_subvol = calloc(1, sizeof(*r-cur_subvol)); + r-total_data_size = 0; + r-bytes_received = 0; + r-progress = 0.0; + r-last_progress_update = 0; + r-bytes_received_last_update = 0; + r-target = Subvolume; if (strlen(r-dest_dir_path) == 0) r-cur_subvol-path = strdup(path); @@ -205,6 +220,12 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, goto out; r-cur_subvol = calloc(1, sizeof(*r-cur_subvol)); + r-total_data_size = 0; + r-bytes_received = 0; + r-progress = 0.0; + r-last_progress_update = 0; + r-bytes_received_last_update = 0; + r-target = Snapshot; Nontrivial amount of duplicate code, a helper would be better. @@ -673,7 +679,7 @@ out: } const char * const cmd_send_usage[] = { - btrfs send [-ve] [-p parent] [-c clone-src] [-f outfile] subvol [subvol...], + btrfs send [-veo] [-p parent] [-c clone-src] [-f outfile] subvol [subvol...], Send the subvolume(s) to stdout., Sends the subvolume(s) specified by subvol to stdout., By default, this will send the whole subvolume. To do an incremental, @@ -697,5 +703,9 @@ const char * const cmd_send_usage[] = { -f outfile Output is normally written to stdout. To write to, a file, use this option. An alternative would be to, use pipes., + -o Obtain the total data size for each subvolume or , UI: -o is sometimes used as an option for 'output', which makes sense in context of send, but is already done via -f. I'm not sure if it's a good choice. 'p' is already occupied. As an alternative: how about -s ? mnemonic for 'size'. + snapshot to send. This demands additional processing, + (mostly IO bound) but is useful for the receive , + command to report progress., NULL }; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: fix typo in subvol list usage
Signed-off-by: David Disseldorp dd...@suse.de --- cmds-subvolume.c | 2 +- man/btrfs.8.in | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 5e821c7..75a7385 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -390,7 +390,7 @@ static const char * const cmd_subvol_list_usage[] = { to the given path, -c print the ogeneration of the subvolume, -g print the generation of the subvolume, - -o print only subvolumes bellow specified path, + -o print only subvolumes below the specified path, -u print the uuid of subvolumes (and snapshots), -q print the parent uuid of the snapshots, -t print the result as a table, diff --git a/man/btrfs.8.in b/man/btrfs.8.in index 8fea115..4221cc2 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -214,7 +214,7 @@ print the ogeneration of the subvolume, aliases: ogen or origin generation. .IP \fB-g\fP 5 print the generation of the subvolume. .IP \fB-o\fP 5 -print only subvolumes bellow specified path. +print only subvolumes below the specified path. .IP \fB-u\fP 5 print the UUID of the subvolume. .IP \fB-q\fP 5 -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel crash triggered by dd to file with memcg, worst on btrfs
Hi, what kernel version are you running? Marian On 04/16/2014 08:42 PM, Richard Davies wrote: Hi all, I have a test case in which I can often crash an entire machine by running dd to a file with a memcg with relatively generous limits. This is simplified from real world problems with heavy disk i/o inside containers. The crashes are easy to trigger when dding to create a file on btrfs. On ext3, typically there is just an error in the kernel log, although occasionally it also crashes. I'm not a kernel developer, but I'm happy to help with any further debugging or try patches. [I have also just reported a different but similar bug with untar in a memcg http://marc.info/?l=linux-mmm=139766321822891 That one is not btrfs-linked] To replicate on Linux 3.14.0, run the following 8 commands: # mkdir -p /sys/fs/cgroup/test/ # cat /sys/fs/cgroup/cpuset.cpus /sys/fs/cgroup/test/cpuset.cpus # cat /sys/fs/cgroup/cpuset.mems /sys/fs/cgroup/test/cpuset.mems # echo $((130)) /sys/fs/cgroup/test/memory.limit_in_bytes # echo $((130)) /sys/fs/cgroup/test/memory.memsw.limit_in_bytes # echo $((128)) /sys/fs/cgroup/test/memory.kmem.limit_in_bytes # echo $$ /sys/fs/cgroup/test/tasks # dd if=/dev/zero of=./crashme bs=2M and leave until several GB of data have been written. When running into a btrfs filesystem, this dd crashes the entire machine about 50% of the time for me, generating a console log as copied below. If the initial dd is running smoothly, I can often get it to crash by stopping the dd with ctrl-c and starting it again with a different output file, perhaps repeating this a few times. When running into an ext3 filesystem, this dd typically doesn't crash the machine but just output errors in the kernel log as copied below. Occasionally it will still crash. I am happy to help with extra information on kernel configuration, but I hope that the above is sufficient for others to replicate. I'm also happy to try suggestions and patches. Thanks in advance for your help, Richard. Ext3 kernel error log = 17:20:05 kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x20) 17:20:05 kernel: cache: ext4_extent_status(2:test), object size: 40, buffer size: 40, default order: 0, min order: 0 17:20:05 kernel: node 0: slabs: 375, objs: 38250, free: 0 17:20:05 kernel: node 1: slabs: 128, objs: 13056, free: 0 (many times) Btrfs kernel console crash log == BUG: unable to handle kernel paging request at fffe36a55230 IP: [810f5055] cpuacct_charge+0x35/0x58 PGD 1b5d067 PUD 0 Thread overran stack, or stack corrupted Oops: [#1] PREEMPT SMP Modules linked in: CPU: 6 PID: 5729 Comm: dd Not tainted 3.14.0-elastic #1 Hardware name: Supermicro H8DMT-IBX/H8DMT-IBX, BIOS 080014 10/17/2009 task: 88040a6fdac0 ti: 8800d69cc000 task.ti: 8800d69cc000 RIP: 0010:[810f5055] [810f5055] cpuacct_charge+0x35/0x58 RSP: 0018:880827d03d88 EFLAGS: 00010002 RAX: 60f7d80032d0 RBX: 88040a6fdac0 RCX: d69cc148 RDX: 88081191a180 RSI: 000ebb99 RDI: 88040a6fdac0 RBP: 880827d03da8 R08: R09: 880827ffc348 R10: 880827ffc2a0 R11: 880827ffc340 R12: d69cc148 R13: 000ebb99 R14: fffebb99 R15: 88040a6fdac0 FS: 7f508b54e6f0() GS:880827d0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: fffe36a55230 CR3: 00080e9d2000 CR4: 07e0 Stack: 88040a6fdac0 880810fe2800 000ebb99 880827d03dd8 810ebbb3 88040a6fdb28 880810fe2800 880827d11bc0 880827d03e28 810eeaaf Call Trace: IRQ [810ebbb3] update_curr+0xc2/0x11e [810eeaaf] task_tick_fair+0x3d/0x631 [810e5bb7] scheduler_tick+0x57/0xba [81108eaf] ? tick_nohz_handler+0xcf/0xcf [810cb73d] update_process_times+0x55/0x66 [81108f2b] tick_sched_timer+0x7c/0x9b [810dd0d2] __run_hrtimer+0x57/0xcc [810dd4c7] hrtimer_interrupt+0xd0/0x1db [810e761b] ? __vtime_account_system+0x2d/0x31 [8105f8c1] local_apic_timer_interrupt+0x53/0x58 [81060475] smp_apic_timer_interrupt+0x3e/0x51 [8186299d] apic_timer_interrupt+0x6d/0x80 EOI Code: 54 53 48 89 fb 48 83 ec 08 48 8b 47 08 4c 63 60 18 e8 84 8c 00 00 48 8b 83 a0 06 00 00 4c 89 e1 48 8b 50 48 48 8b 82 80 00 00 00 48 03 04 cd f0 47 bf 81 4c 01 28 48 8b 52 40 48 85 d2 75 e5 e8 RIP [810f5055] cpuacct_charge+0x35/0x58 RSP 880827d03d88 CR2: fffe36a55230 ---[ end trace b449af50c3a0711c ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) -- To unsubscribe from this list: send the line unsubscribe cgroups in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: Kernel crash triggered by dd to file with memcg, worst on btrfs
Richard Davies wrote: I have a test case in which I can often crash an entire machine by running dd to a file with a memcg with relatively generous limits. This is simplified from real world problems with heavy disk i/o inside containers. The crashes are easy to trigger when dding to create a file on btrfs. On ext3, typically there is just an error in the kernel log, although occasionally it also crashes. A further note - the ext3 SLUB errors occur when dding into a ext3 file alone. The few ext3 crashes occurred when dding into a btrfs file for a while without a crash, then switching to dding into an ext3 file. So the ext3 crashes could actually be due to btrfs cached data still in memory - i.e. all crashes could be due to btrfs use. Richard. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel crash triggered by dd to file with memcg, worst on btrfs
Hi all, I have a test case in which I can often crash an entire machine by running dd to a file with a memcg with relatively generous limits. This is simplified from real world problems with heavy disk i/o inside containers. The crashes are easy to trigger when dding to create a file on btrfs. On ext3, typically there is just an error in the kernel log, although occasionally it also crashes. I'm not a kernel developer, but I'm happy to help with any further debugging or try patches. [I have also just reported a different but similar bug with untar in a memcg http://marc.info/?l=linux-mmm=139766321822891 That one is not btrfs-linked] To replicate on Linux 3.14.0, run the following 8 commands: # mkdir -p /sys/fs/cgroup/test/ # cat /sys/fs/cgroup/cpuset.cpus /sys/fs/cgroup/test/cpuset.cpus # cat /sys/fs/cgroup/cpuset.mems /sys/fs/cgroup/test/cpuset.mems # echo $((130)) /sys/fs/cgroup/test/memory.limit_in_bytes # echo $((130)) /sys/fs/cgroup/test/memory.memsw.limit_in_bytes # echo $((128)) /sys/fs/cgroup/test/memory.kmem.limit_in_bytes # echo $$ /sys/fs/cgroup/test/tasks # dd if=/dev/zero of=./crashme bs=2M and leave until several GB of data have been written. When running into a btrfs filesystem, this dd crashes the entire machine about 50% of the time for me, generating a console log as copied below. If the initial dd is running smoothly, I can often get it to crash by stopping the dd with ctrl-c and starting it again with a different output file, perhaps repeating this a few times. When running into an ext3 filesystem, this dd typically doesn't crash the machine but just output errors in the kernel log as copied below. Occasionally it will still crash. I am happy to help with extra information on kernel configuration, but I hope that the above is sufficient for others to replicate. I'm also happy to try suggestions and patches. Thanks in advance for your help, Richard. Ext3 kernel error log = 17:20:05 kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x20) 17:20:05 kernel: cache: ext4_extent_status(2:test), object size: 40, buffer size: 40, default order: 0, min order: 0 17:20:05 kernel: node 0: slabs: 375, objs: 38250, free: 0 17:20:05 kernel: node 1: slabs: 128, objs: 13056, free: 0 (many times) Btrfs kernel console crash log == BUG: unable to handle kernel paging request at fffe36a55230 IP: [810f5055] cpuacct_charge+0x35/0x58 PGD 1b5d067 PUD 0 Thread overran stack, or stack corrupted Oops: [#1] PREEMPT SMP Modules linked in: CPU: 6 PID: 5729 Comm: dd Not tainted 3.14.0-elastic #1 Hardware name: Supermicro H8DMT-IBX/H8DMT-IBX, BIOS 080014 10/17/2009 task: 88040a6fdac0 ti: 8800d69cc000 task.ti: 8800d69cc000 RIP: 0010:[810f5055] [810f5055] cpuacct_charge+0x35/0x58 RSP: 0018:880827d03d88 EFLAGS: 00010002 RAX: 60f7d80032d0 RBX: 88040a6fdac0 RCX: d69cc148 RDX: 88081191a180 RSI: 000ebb99 RDI: 88040a6fdac0 RBP: 880827d03da8 R08: R09: 880827ffc348 R10: 880827ffc2a0 R11: 880827ffc340 R12: d69cc148 R13: 000ebb99 R14: fffebb99 R15: 88040a6fdac0 FS: 7f508b54e6f0() GS:880827d0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: fffe36a55230 CR3: 00080e9d2000 CR4: 07e0 Stack: 88040a6fdac0 880810fe2800 000ebb99 880827d03dd8 810ebbb3 88040a6fdb28 880810fe2800 880827d11bc0 880827d03e28 810eeaaf Call Trace: IRQ [810ebbb3] update_curr+0xc2/0x11e [810eeaaf] task_tick_fair+0x3d/0x631 [810e5bb7] scheduler_tick+0x57/0xba [81108eaf] ? tick_nohz_handler+0xcf/0xcf [810cb73d] update_process_times+0x55/0x66 [81108f2b] tick_sched_timer+0x7c/0x9b [810dd0d2] __run_hrtimer+0x57/0xcc [810dd4c7] hrtimer_interrupt+0xd0/0x1db [810e761b] ? __vtime_account_system+0x2d/0x31 [8105f8c1] local_apic_timer_interrupt+0x53/0x58 [81060475] smp_apic_timer_interrupt+0x3e/0x51 [8186299d] apic_timer_interrupt+0x6d/0x80 EOI Code: 54 53 48 89 fb 48 83 ec 08 48 8b 47 08 4c 63 60 18 e8 84 8c 00 00 48 8b 83 a0 06 00 00 4c 89 e1 48 8b 50 48 48 8b 82 80 00 00 00 48 03 04 cd f0 47 bf 81 4c 01 28 48 8b 52 40 48 85 d2 75 e5 e8 RIP [810f5055] cpuacct_charge+0x35/0x58 RSP 880827d03d88 CR2: fffe36a55230 ---[ end trace b449af50c3a0711c ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4 v2] Btrfs-progs: send, implement total data size callback and progress report
On Wed, Apr 16, 2014 at 6:43 PM, David Sterba dste...@suse.cz wrote: On Wed, Apr 16, 2014 at 03:56:15PM +0100, Filipe David Borba Manana wrote: V2: Added new send ioctl flag BTRFS_SEND_FLAG_SUPPORT_FALLOCATE. A version 2 stream is now only produced is the ioctl caller specifies at least one of the new send flags (BTRFS_SEND_FLAG_SUPPORT_FALLOCATE or BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE). Good. @@ -156,6 +165,12 @@ static int process_subvol(const char *path, const u8 *uuid, u64 ctransid, goto out; r-cur_subvol = calloc(1, sizeof(*r-cur_subvol)); + r-total_data_size = 0; + r-bytes_received = 0; + r-progress = 0.0; + r-last_progress_update = 0; + r-bytes_received_last_update = 0; + r-target = Subvolume; if (strlen(r-dest_dir_path) == 0) r-cur_subvol-path = strdup(path); @@ -205,6 +220,12 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, goto out; r-cur_subvol = calloc(1, sizeof(*r-cur_subvol)); + r-total_data_size = 0; + r-bytes_received = 0; + r-progress = 0.0; + r-last_progress_update = 0; + r-bytes_received_last_update = 0; + r-target = Snapshot; Nontrivial amount of duplicate code, a helper would be better. Agree about the duplication, but can't agree with being non-trivial: resetting to 0 a few counters/percentage/timestamp :) @@ -673,7 +679,7 @@ out: } const char * const cmd_send_usage[] = { - btrfs send [-ve] [-p parent] [-c clone-src] [-f outfile] subvol [subvol...], + btrfs send [-veo] [-p parent] [-c clone-src] [-f outfile] subvol [subvol...], Send the subvolume(s) to stdout., Sends the subvolume(s) specified by subvol to stdout., By default, this will send the whole subvolume. To do an incremental, @@ -697,5 +703,9 @@ const char * const cmd_send_usage[] = { -f outfile Output is normally written to stdout. To write to, a file, use this option. An alternative would be to, use pipes., + -o Obtain the total data size for each subvolume or , UI: -o is sometimes used as an option for 'output', which makes sense in context of send, but is already done via -f. I'm not sure if it's a good choice. 'p' is already occupied. As an alternative: how about -s ? mnemonic for 'size'. Fine for me. Don't have a strong opinion about the letter. thanks David + snapshot to send. This demands additional processing, + (mostly IO bound) but is useful for the receive , + command to report progress., NULL }; -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4 v3] Btrfs-progs: send, implement total data size callback and progress report
This is a followup to the kernel patch titled: Btrfs: send, implement total data size command to allow for progress estimation This makes the btrfs send and receive commands aware of the new send flag, named BTRFS_SEND_C_TOTAL_DATA_SIZE, which tells us the amount of file data that is new between the parent and send snapshots/roots. As this command immediately follows the commands to start a snapshot/subvolume, it can be used to report and compute progress, by keeping a counter that is incremented with the data length of each write, clone and fallocate command that is received from the stream. Example: $ btrfs send -s /mnt/sdd/snap_base | btrfs receive /mnt/sdc At subvol /mnt/sdd/snap_base At subvol snap_base About to receive 9212392667 bytes Subvolume /mnt/sdc//snap_base, 4059722426 / 9212392667 bytes received, 44.07%, 40.32MB/s $ btrfs send -s -p /mnt/sdd/snap_base /mnt/sdd/snap_incr | btrfs receive /mnt/sdc At subvol /mnt/sdd/snap_incr At subvol snap_incr About to receive 9571342213 bytes Subvolume /mnt/sdc//snap_incr, 6557345221 / 9571342213 bytes received, 68.51%, 51.04MB/s At the moment progress is only reported by btrfs-receive, but it is possible and simple to do it for btrfs-send too, so that we can get progress report when not piping btrfs-send output to btrfs-receive (directly to a file). Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Added new send ioctl flag BTRFS_SEND_FLAG_SUPPORT_FALLOCATE. A version 2 stream is now only produced is the ioctl caller specifies at least one of the new send flags (BTRFS_SEND_FLAG_SUPPORT_FALLOCATE or BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE). V3: Renamed option -o to -s, removed some duplicated code (progress reset). Documentation/btrfs-send.txt | 3 ++ cmds-receive.c | 91 cmds-send.c | 14 ++- send-stream.c| 4 ++ send-stream.h| 1 + 5 files changed, 111 insertions(+), 2 deletions(-) diff --git a/Documentation/btrfs-send.txt b/Documentation/btrfs-send.txt index 18a98fa..a37d63c 100644 --- a/Documentation/btrfs-send.txt +++ b/Documentation/btrfs-send.txt @@ -40,6 +40,9 @@ Use this snapshot as a clone source for an incremental send (multiple allowed). -f outfile:: Output is normally written to stdout. To write to a file, use this option. An alternative would be to use pipes. +-s:: +Obtain the total data size for each subvolume or snapshot to send. This demands additional +processing (mostly IO bound) but is useful for the receive command to report progress. EXIT STATUS --- diff --git a/cmds-receive.c b/cmds-receive.c index d6cd3da..4cbf276 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -32,6 +32,7 @@ #include ftw.h #include wait.h #include assert.h +#include time.h #include sys/stat.h #include sys/types.h @@ -71,6 +72,14 @@ struct btrfs_receive struct subvol_uuid_search sus; int honor_end_cmd; + + /* For the subvolume/snapshot we're currently receiving. */ + u64 total_data_size; + u64 bytes_received; + time_t last_progress_update; + u64 bytes_received_last_update; + float progress; + const char *target; }; static int finish_subvol(struct btrfs_receive *r) @@ -143,6 +152,16 @@ out: return ret; } +static void reset_progress(struct btrfs_receive *r, const char *dest) +{ + r-total_data_size = 0; + r-bytes_received = 0; + r-progress = 0.0; + r-last_progress_update = 0; + r-bytes_received_last_update = 0; + r-target = dest; +} + static int process_subvol(const char *path, const u8 *uuid, u64 ctransid, void *user) { @@ -156,6 +175,7 @@ static int process_subvol(const char *path, const u8 *uuid, u64 ctransid, goto out; r-cur_subvol = calloc(1, sizeof(*r-cur_subvol)); + reset_progress(r, Subvolume); if (strlen(r-dest_dir_path) == 0) r-cur_subvol-path = strdup(path); @@ -205,6 +225,7 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, goto out; r-cur_subvol = calloc(1, sizeof(*r-cur_subvol)); + reset_progress(r, Snapshot); if (strlen(r-dest_dir_path) == 0) r-cur_subvol-path = strdup(path); @@ -287,6 +308,73 @@ out: return ret; } +static int process_total_data_size(u64 size, void *user) +{ + struct btrfs_receive *r = user; + + r-total_data_size = size; + fprintf(stdout, About to receive %llu bytes\n, size); + + return 0; +} + +static void update_progress(struct btrfs_receive *r, u64 bytes) +{ + float new_progress; + time_t now; + time_t tdiff; + + if (r-total_data_size == 0) + return; + + r-bytes_received += bytes; + + now = time(NULL); + tdiff = now -
[PATCH 3/4 v3] Btrfs-progs: send, implement fallocate command callback
The fallocate send stream command, added in stream version 2, is used to pre-allocate space for files and punch file holes. This change implements the callback for that new command, using the fallocate function from the standard C library to carry out the specified action (allocate file space or punch a file hole). Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Use the new send ioctl flag BTRFS_SEND_FLAG_SUPPORT_FALLOCATE if the user asks for it (-a command line option), which will make the kernel generate a version 2 send stream, so that old clients aren't affected. V3: Rebased on new patchset (new version of patch 2/4). Documentation/btrfs-send.txt | 3 +++ cmds-receive.c | 38 ++ cmds-send.c | 12 ++-- send-stream.c| 13 + send-stream.h| 2 ++ 5 files changed, 66 insertions(+), 2 deletions(-) diff --git a/Documentation/btrfs-send.txt b/Documentation/btrfs-send.txt index a37d63c..6893b90 100644 --- a/Documentation/btrfs-send.txt +++ b/Documentation/btrfs-send.txt @@ -43,6 +43,9 @@ An alternative would be to use pipes. -s:: Obtain the total data size for each subvolume or snapshot to send. This demands additional processing (mostly IO bound) but is useful for the receive command to report progress. +-a:: +Use fallocate to pre-allocate file extents and to punch file holes, instead of writing zeroes +to files. EXIT STATUS --- diff --git a/cmds-receive.c b/cmds-receive.c index 4cbf276..908f968 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -41,6 +41,7 @@ #include sys/types.h #include sys/xattr.h #include uuid/uuid.h +#include linux/falloc.h #include ctree.h #include ioctl.h @@ -887,6 +888,42 @@ out: return ret; } +static int process_fallocate(const char *path, u32 flags, u64 offset, +u64 len, void *user) +{ + struct btrfs_receive *r = user; + char *full_path = path_cat(r-full_subvol_path, path); + int mode = 0; + int ret; + + if (flags BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE) + mode |= FALLOC_FL_KEEP_SIZE; + if (flags BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE) + mode |= FALLOC_FL_PUNCH_HOLE; + + if (g_verbose = 2) + fprintf(stderr, + fallocate %s - flags %u, offset %llu, len %llu\n, + path, flags, offset, len); + + ret = open_inode_for_write(r, full_path); + if (ret 0) + goto out; + + ret = fallocate(r-write_fd, mode, offset, len); + if (ret) { + ret = -errno; + fprintf(stderr, + ERROR: fallocate against %s failed. %s\n, + path, strerror(-ret)); + goto out; + } + update_progress(r, len); + +out: + free(full_path); + return ret; +} static struct btrfs_send_ops send_ops = { .subvol = process_subvol, @@ -910,6 +947,7 @@ static struct btrfs_send_ops send_ops = { .chown = process_chown, .utimes = process_utimes, .total_data_size = process_total_data_size, + .fallocate = process_fallocate, }; static int do_receive(struct btrfs_receive *r, const char *tomnt, int r_fd) diff --git a/cmds-send.c b/cmds-send.c index 6f2d7c1..7203956 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -46,6 +46,7 @@ static int g_verbose = 0; static int g_total_data_size = 0; +static int g_fallocate = 0; struct btrfs_send { int send_fd; @@ -284,6 +285,8 @@ static int do_send(struct btrfs_send *send, u64 parent_root_id, io_send.flags |= BTRFS_SEND_FLAG_OMIT_END_CMD; if (g_total_data_size) io_send.flags |= BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE; + if (g_fallocate) + io_send.flags |= BTRFS_SEND_FLAG_SUPPORT_FALLOCATE; ret = ioctl(subvol_fd, BTRFS_IOC_SEND, io_send); if (ret) { ret = -errno; @@ -427,7 +430,7 @@ int cmd_send(int argc, char **argv) memset(send, 0, sizeof(send)); send.dump_fd = fileno(stdout); - while ((c = getopt(argc, argv, vesc:f:i:p:)) != -1) { + while ((c = getopt(argc, argv, vesac:f:i:p:)) != -1) { switch (c) { case 'v': g_verbose++; @@ -517,6 +520,9 @@ int cmd_send(int argc, char **argv) case 's': g_total_data_size = 1; break; + case 'a': + g_fallocate = 1; + break; case '?': default: fprintf(stderr, ERROR: send args invalid.\n); @@ -679,7 +685,7 @@ out: } const char * const cmd_send_usage[] = { - btrfs send [-ves] [-p parent] [-c clone-src] [-f outfile] subvol [subvol...], + btrfs send [-vesa] [-p parent] [-c
Re: How do I find the physical block number?
On 16 April 2014 15:27, Aastha Mehta aasth...@gmail.com wrote: Hello, I have created a 500GB partition on my HDD and formatted it for btrfs. I created a file on it. # echo tmp data in the tmp file.. /mnt/btrfs/tmp-file # umount /mnt/btrfs Next I want to know the blocks allocated for the file and I used filefrag for it. I get some information as follows - # mount -o max_inline=0 /dev/sdc2 /mnt/btrfs # filefrag -v /mnt/btrfs/tmp-file Filesystem type is: 9123683e File size of /mnt/btrfs/tmp-file is 27 (1 block, blocksize 4096) ext logical physical expected length flags 0 0 65924123 1 eof /mnt/btrfs/tmp-file: 1 extent found Now, I want to read the same data from the disk directly. I tried the following - block 65924123 = byte (65924123*4096) = 270025207808 # dd if=/dev/sdc2 of=tmp-file skip=270025207808 bs=1 count=4096 # cat tmp-file I cannot read the file's contents but some garbage. I read somewhere that the physical block number shown in filefrag may actually be a logical block for the file system and it has an additional translation to physical block number. So next I tried the following - # btrfs-map-logical -l 65924123 /dev/sdc2 mirror 1 logical 65924123 physical 74312731 device /dev/sdc2 mirror 2 logical 65924123 physical 1148054555 device /dev/sdc2 I again tried reading the block 74312731 using the dd command as above, but it is still not the right block. I want to know what does the physical block number returned by filefrag mean, why there are two mappings for the above block number and how I can find the exact physical disk block number the file system actually writes to. My sdc has the following partitions: Device Boot Start End Blocks Id System /dev/sdc12048 419432447 209715200 83 Linux /dev/sdc2 1468008448 2516584447 524288000 83 Linux (BTRFS) /dev/sdc3 419432448 1468008447 524288000 83 Linux Thanks, Aastha. I realized my mistake in using the btrfs-map-logical command. It should have been # btrfs-map-logical -l 270025207808 /dev/sdc2 Now, everything works fine. Please ignore my post, except it may be useful for somebody else needing this information in future. Thanks, Aastha. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: very slow btrfs filesystem: any data needed before I wipe it?
On Mon, Apr 14, 2014 at 10:28:36AM +, Duncan wrote: But you might well be the first report where the devs have good access to enough detail to actually trace down the problem. I'll wait until tomorrow night to see if the devs want anything else out of it, but otherwise I'll wipe it and start over. Given that our exchange happened over the weekend and tomorrow is Monday, I'd consider waiting until Tuesday nite if possible, just to be sure. Because Monday is of course back to work day, and it's possible if there's something urgent (or if they're taking a long weekend for some reason), they won't get fully caught up from the weekend until Tuesday. So it seems that no one is interested in that filesystem. I probably won't have time to wipe and rebuild it until this weekend, so I'll see how it works after a rebuild. Hopefully I won't hit the same problems again. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: btrfs, test send's ability to punch holes and prealloc extents
On Wed, Apr 16, 2014 at 03:39:18PM +0100, Filipe David Manana wrote: On Wed, Apr 16, 2014 at 1:23 AM, Dave Chinner da...@fromorbit.com wrote: On Tue, Apr 15, 2014 at 05:43:21PM +0100, Filipe David Borba Manana wrote: This test verifies that after an incremental btrfs send the replicated file has the same exact hole and data structure as in the origin filesystem. This didn't use to be the case before the send stream version 2 - holes were sent as write operations of 0 valued bytes instead of punching holes with the fallocate system call, and pre-allocated extents were sent as well as write operations of 0 valued bytes instead of intructions for the receiver to use the fallocate system call. Also checks that prealloc extents that lie beyond the file's size are replicated by an incremental send. Can you wrap commit messages at 68 columns? +md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch +# List all hole and data segments. +$XFS_IO_PROG -r -c seek -r -a 0 $SCRATCH_MNT/mysnap2/foo +# List all extents, we're interested here in prealloc extents that lie beyond +# the file's size. +$XFS_IO_PROG -r -c fiemap -l $SCRATCH_MNT/mysnap2/foo | _filter_scratch That dumps raw block numbers into the golden output. _filter_fiemap is probably needed here. Hum, just tried it and uploaded a v2. However I'm now noticing it doesn't do everything I had in mind. _filter_fiemap is not showing the extents falloc -k created, only a collapsed range of holes. So my intention is to verify not just holes, but also the extents created by 'falloc -k'. The following filter I just made locally gives me that: _filter_all_fiemap() { awk --posix ' $3 ~ /hole/ { print $1, $2, $3; next; } $3 ~ /[[:xdigit:]]*..[[:xdigit:]]/ { print $1, $2, extent; next; }' } Which is effectively _filter_hole_fiemap(), except it coalesces adjacent extents into a single range. I'd suggest moving the _filter_* functions from common/punch to common/filter, and using _filter_hole_fiemap() as there's no guarantee that you'll get individual extents for each falloc -k range - they coul dbe allocated contiguously, and hence the number of extents reported can change from run to run. That's the reason why the filters coalesce adjacent file offsets of the same type - we care whether the range of the file contains the correct extent type, not how fragmented the range is (nicely printed/indented at https://friendpaste.com/1JtG5bts2Sz0LWhUutCpzE, as e-mail is not good for code pasting) Pasting code works fine for me ;) Which gives me: 0: [0..191]: extent 1: [192..199]: extent 2: [200..231]: extent 3: [232..239]: extent 4: [240..287]: extent 5: [288..295]: extent 6: [296..487]: extent 7: [488..495]: extent 8: [496..519]: hole 9: [520..527]: extent 10: [528..583]: extent 11: [584..591]: extent 12: [592..2543]: extent 13: [2544..17575]: hole 14: [17576..21487]: extent Also, you're trimming of the block count, so you can drop the -l option to the fiemap command Versus only (from _filter_fiemap): 0: [496..17575]: hole Maybe the -l option is confusing the filter, it should be giving: 0: [0..495]: data 1: [496..519]: hole 2: [520..2543]: data 3: [2544..17575]: hole 4: [17576..21487]: data Though if there are unwritten extents, it will say unwritten rather than data. _filter_hole_fiemap should give: 0: [0..495]: extent 1: [496..519]: hole 2: [520..2543]: extent 3: [2544..17575]: hole 4: [17576..21487]: extent Which tells you that everything you asked for was allocated... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: btrfs, test send's ability to punch holes and prealloc extents
On Thu, Apr 17, 2014 at 12:13 AM, Dave Chinner da...@fromorbit.com wrote: On Wed, Apr 16, 2014 at 03:39:18PM +0100, Filipe David Manana wrote: On Wed, Apr 16, 2014 at 1:23 AM, Dave Chinner da...@fromorbit.com wrote: On Tue, Apr 15, 2014 at 05:43:21PM +0100, Filipe David Borba Manana wrote: This test verifies that after an incremental btrfs send the replicated file has the same exact hole and data structure as in the origin filesystem. This didn't use to be the case before the send stream version 2 - holes were sent as write operations of 0 valued bytes instead of punching holes with the fallocate system call, and pre-allocated extents were sent as well as write operations of 0 valued bytes instead of intructions for the receiver to use the fallocate system call. Also checks that prealloc extents that lie beyond the file's size are replicated by an incremental send. Can you wrap commit messages at 68 columns? +md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch +# List all hole and data segments. +$XFS_IO_PROG -r -c seek -r -a 0 $SCRATCH_MNT/mysnap2/foo +# List all extents, we're interested here in prealloc extents that lie beyond +# the file's size. +$XFS_IO_PROG -r -c fiemap -l $SCRATCH_MNT/mysnap2/foo | _filter_scratch That dumps raw block numbers into the golden output. _filter_fiemap is probably needed here. Hum, just tried it and uploaded a v2. However I'm now noticing it doesn't do everything I had in mind. _filter_fiemap is not showing the extents falloc -k created, only a collapsed range of holes. So my intention is to verify not just holes, but also the extents created by 'falloc -k'. The following filter I just made locally gives me that: _filter_all_fiemap() { awk --posix ' $3 ~ /hole/ { print $1, $2, $3; next; } $3 ~ /[[:xdigit:]]*..[[:xdigit:]]/ { print $1, $2, extent; next; }' } Which is effectively _filter_hole_fiemap(), except it coalesces adjacent extents into a single range. I'd suggest moving the _filter_* functions from common/punch to common/filter, and using _filter_hole_fiemap() as there's no guarantee that you'll get individual extents for each falloc -k range - they coul dbe allocated contiguously, and hence the number of extents reported can change from run to run. That's the reason why the filters coalesce adjacent file offsets of the same type - we care whether the range of the file contains the correct extent type, not how fragmented the range is Right. Thanks for pointing it out Dave. (nicely printed/indented at https://friendpaste.com/1JtG5bts2Sz0LWhUutCpzE, as e-mail is not good for code pasting) Pasting code works fine for me ;) Which gives me: 0: [0..191]: extent 1: [192..199]: extent 2: [200..231]: extent 3: [232..239]: extent 4: [240..287]: extent 5: [288..295]: extent 6: [296..487]: extent 7: [488..495]: extent 8: [496..519]: hole 9: [520..527]: extent 10: [528..583]: extent 11: [584..591]: extent 12: [592..2543]: extent 13: [2544..17575]: hole 14: [17576..21487]: extent Also, you're trimming of the block count, so you can drop the -l option to the fiemap command Versus only (from _filter_fiemap): 0: [496..17575]: hole Maybe the -l option is confusing the filter, it should be giving: 0: [0..495]: data 1: [496..519]: hole 2: [520..2543]: data 3: [2544..17575]: hole 4: [17576..21487]: data Though if there are unwritten extents, it will say unwritten rather than data. _filter_hole_fiemap should give: 0: [0..495]: extent 1: [496..519]: hole 2: [520..2543]: extent 3: [2544..17575]: hole 4: [17576..21487]: extent Which tells you that everything you asked for was allocated... Ok, figured out my mistake. _filter_fiemap works just fine, it gives me all the information I wanted (as in your last example) as long as I pass the -v option (and not -l, or no options at all). Thank you very much Dave :) Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] xfstests: btrfs, test send's ability to punch holes and prealloc extents
This test verifies that after an incremental btrfs send the replicated file has the same exact hole and data structure as in the origin filesystem. This didn't use to be the case before the send stream version 2 - holes were sent as write operations of 0 valued bytes instead of punching holes with the fallocate system call, and pre-allocated extents were sent as well as write operations of 0 valued bytes instead of intructions for the receiver to use the fallocate system call. It also checks that prealloc extents that lie beyond the file's size are replicated by an incremental send. Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Addressed Dave's comments, and updated btrfs send invocation, by specifying the new command line option (-a) that enables use of fallocate - added function _require_btrfs_send_fallocate_flag() to skip the test when an old version of btrfs-progs is found. V3: Corrected use of fiemap with _filter_fiemap. Was passing -l instead of -v to fiemap, which resulted in output consisting only of a single line related to a hole instead of all holes and data extents (and I wanted to verify the falloc -k extents were preserved after the btrfs send). common/rc | 9 tests/btrfs/047 | 121 tests/btrfs/047.out | 35 +++ tests/btrfs/group | 1 + 4 files changed, 166 insertions(+) create mode 100755 tests/btrfs/047 create mode 100644 tests/btrfs/047.out diff --git a/common/rc b/common/rc index acf419b..e94e51c 100644 --- a/common/rc +++ b/common/rc @@ -2262,6 +2262,15 @@ _run_btrfs_util_prog() run_check $BTRFS_UTIL_PROG $* } +_require_btrfs_send_fallocate_flag() +{ + $BTRFS_UTIL_PROG send 21 | \ + grep '^[ \t]*\-a[ \t]\+.* fallocate ' /dev/null 21 + if [ $? -ne 0 ]; then + _notrun Missing btrfs-progs send -a command line option, skipped this test + fi +} + init_rc() { if [ $iam == new ] diff --git a/tests/btrfs/047 b/tests/btrfs/047 new file mode 100755 index 000..e39b019 --- /dev/null +++ b/tests/btrfs/047 @@ -0,0 +1,121 @@ +#! /bin/bash +# FS QA Test No. btrfs/047 +# +# Verify that after an incremental btrfs send the replicated file has +# the same exact hole and data structure as in the origin filesystem. +# This didn't use to be the case before the send stream version 2 - +# holes were sent as write operations of 0 valued bytes instead of punching +# holes with the fallocate system call, and pre-allocated extents were sent +# as well as write operations of 0 valued bytes instead of intructions for +# the receiver to use the fallocate system call. Also check that prealloc +# extents that lie beyond the file's size are replicated by an incremental +# send. +# +# More specifically, this structure preserving guarantee was added by the +# following linux kernel commits: +# +#Btrfs: send, use fallocate command to punch holes +#Btrfs: send, use fallocate command to allocate extents +# +#--- +# Copyright (c) 2014 Filipe Manana. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ +rm -fr $send_files_dir +rm -fr $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/punch + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_fssum +_require_xfs_io_fiemap +_require_btrfs_send_fallocate_flag +_need_to_be_root + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs /dev/null 21 +_scratch_mount + +$XFS_IO_PROG -f -c pwrite -S 0x01 -b 30 0 30 $SCRATCH_MNT/foo \ + | _filter_xfs_io + +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 + +$XFS_IO_PROG -c fpunch 10 5 $SCRATCH_MNT/foo +$XFS_IO_PROG -c falloc 10 5 $SCRATCH_MNT/foo +$XFS_IO_PROG -c pwrite -S 0xff -b 1000 12 1000 $SCRATCH_MNT/foo \ + | _filter_xfs_io
Re: [PATCH] btrfs-progs: doc: link btrfsck to btrfs-check
Original Message Subject: [PATCH] btrfs-progs: doc: link btrfsck to btrfs-check From: David Sterba dste...@suse.cz To: linux-btrfs@vger.kernel.org Date: 2014年04月17日 01:16 The 'btrfsck' command has been deprecated in favor of 'btrfs check'. For compatibility install a symlink to the btrfs-check.8 manpage. CC: Qu Wenruo quwen...@cn.fujitsu.com Signed-off-by: David Sterba dste...@suse.cz --- Documentation/Makefile| 2 ++ Documentation/btrfs-check.txt | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/Makefile b/Documentation/Makefile index ec8598bb57d3..1eef9fd57da3 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -48,6 +48,7 @@ XMLTO_EXTRA = -m manpage-bold-literal.xsl GZIP = gzip INSTALL ?= install RM ?= rm -f +LNS ?= ln -sf BTRFS_VERSION = $(shell sed -n 's/.*BTRFS_BUILD_VERSION Btrfs \(.*\)/\1/p'\ ../version.h) @@ -73,6 +74,7 @@ install: install-man install-man: man $(INSTALL) -d -m 755 $(DESTDIR)$(man8dir) $(INSTALL) -m 644 $(GZ_MAN8) $(DESTDIR)$(man8dir) + $(LNS) btrfs-check.txt $(DESTDIR)$(man8dir) Shouldn't the source of soft link be btrfs-check.8.gz. ? clean: $(RM) *.xml *.xml+ *.8 *.8.gz diff --git a/Documentation/btrfs-check.txt b/Documentation/btrfs-check.txt index ddd7fe77eca2..485a49cbc3ec 100644 --- a/Documentation/btrfs-check.txt +++ b/Documentation/btrfs-check.txt @@ -18,6 +18,8 @@ command, it is *highly* recommended to read the following btrfs wiki before executing 'btrfs check' with '--repair' option: + https://btrfs.wiki.kernel.org/index.php/Btrfsck +'btrfsck' is an alias of 'btrfs check' command and is now deprecated. + OPTIONS --- -s|--support superblock:: @@ -47,4 +49,3 @@ SEE ALSO `mkfs.btrfs`(8), `btrfs-scrub`(8), `btrfs-rescue`(8) -`btrfsck`(8) Sorry to bother you but 'btrfs-scrub'/'btrfs-rescue' and 'btrfs-restore' seems also metioning 'btrfsck' and may also needs to remove 'btrfsck'. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html