Re: [PATCH v5 1/3] Btrfs-progs: move open_file_or_dir() to utils.c
On Thu, 07 Jun 2012 21:38:25 +0200, Goffredo Baroncelli wrote: Hi Stefan, On 05/25/2012 04:07 PM, Stefan Behrens wrote: This is a preparation step to add support for device stats. The definition of the function open_file_or_dir() is moved from common.c to utils.c in order to be able to share some common code between scrub and the device stats in the following step. That common code uses open_file_or_dir(). Since open_file_or_dir() makes use of the function dirfd(3), the required XOPEN version was raised from 6 to 7. Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de --- Makefile |4 ++-- btrfsctl.c | 28 cmds-balance.c |1 + cmds-inspect.c |1 + cmds-subvolume.c |1 + commands.h |3 --- common.c | 46 -- utils.c | 31 +-- utils.h |3 +++ 9 files changed, 37 insertions(+), 81 deletions(-) diff --git a/Makefile b/Makefile index 79818e6..fe2b432 100644 --- a/Makefile +++ b/Makefile @@ -39,8 +39,8 @@ all: version $(progs) manpages version: bash version.sh -btrfs: $(objects) btrfs.o help.o common.o $(cmds_objects) -$(CC) $(CFLAGS) -o btrfs btrfs.o help.o common.o $(cmds_objects) \ +btrfs: $(objects) btrfs.o help.o $(cmds_objects) +$(CC) $(CFLAGS) -o btrfs btrfs.o help.o $(cmds_objects) \ $(objects) $(LDFLAGS) $(LIBS) -lpthread calc-size: $(objects) calc-size.o diff --git a/btrfsctl.c b/btrfsctl.c index d45e2a7..f0584f3 100644 --- a/btrfsctl.c +++ b/btrfsctl.c @@ -63,34 +63,6 @@ static void print_usage(void) exit(1); } -static int open_file_or_dir(const char *fname) -{ -int ret; -struct stat st; -DIR *dirstream; -int fd; - -ret = stat(fname, st); -if (ret 0) { -perror(stat:); -exit(1); -} -if (S_ISDIR(st.st_mode)) { -dirstream = opendir(fname); -if (!dirstream) { -perror(opendir); -exit(1); -} -fd = dirfd(dirstream); -} else { -fd = open(fname, O_RDWR); -} -if (fd 0) { -perror(open); -exit(1); -} -return fd; -} I suppose that you want to remove open_file_or_dir() from btrfsctl.c and use the one provided by utils.c (or common.c). However I have to point out that these two function are a bit different: the first one does exit() in case of error; the second one returns an error. In any case I think that it is time to deprecate and remove btrfsctl. BR Goffredo Hi Goffredo, Yes, I was aware of this difference but did not want to spend _any_ additional effort in the deprecated btrfsctl tool. Maybe it would have been better to just disable its default build in the Makefile. Does anybody have any objections against removing the btrfsctl, btrfs-show and btrfs-vol tools from the all target of Makefile? int main(int ac, char **av) { char *fname = NULL; diff --git a/cmds-balance.c b/cmds-balance.c index 38a7426..5793b5c 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -26,6 +26,7 @@ #include ctree.h #include ioctl.h #include volumes.h +#include utils.h #include commands.h diff --git a/cmds-inspect.c b/cmds-inspect.c index 2f0228f..7a8785b 100644 --- a/cmds-inspect.c +++ b/cmds-inspect.c @@ -22,6 +22,7 @@ #include kerncompat.h #include ioctl.h +#include utils.h #include commands.h diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 950fa8f..8ecd3f4 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -26,6 +26,7 @@ #include kerncompat.h #include ioctl.h +#include utils.h #include commands.h diff --git a/commands.h b/commands.h index a303a50..aea4cb1 100644 --- a/commands.h +++ b/commands.h @@ -79,9 +79,6 @@ void help_ambiguous_token(const char *arg, const struct cmd_group *grp); void help_command_group(const struct cmd_group *grp, int argc, char **argv); -/* common.c */ -int open_file_or_dir(const char *fname); - extern const struct cmd_group subvolume_cmd_group; extern const struct cmd_group filesystem_cmd_group; extern const struct cmd_group balance_cmd_group; diff --git a/common.c b/common.c deleted file mode 100644 index 03f6570..000 --- a/common.c +++ /dev/null @@ -1,46 +0,0 @@ -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public - * License v2 as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public - *
btrfs filesystems can only be mounted after an unclean shutdown if btrfsck is run and immediately killed!
Hi all, I have two multi-disk btrfs filesystems on a Arch linux 3.4.0 system. After a power failure, both filesystems refuse to mount [ 10.402284] Btrfs loaded [ 10.402714] device fsid 1e7c18a4-02d6-44b1-8eaf-c01378009cd3 devid 4 transid 65282 /dev/sdc [ 10.403108] btrfs: force zlib compression [ 10.403130] btrfs: enabling inode map caching [ 10.403152] btrfs: disk space caching is enabled [ 10.403377] btrfs: failed to read the system array on sdc [ 10.403557] btrfs: open_ctree failed [ 10.431763] device fsid 7f7be913-e359-400f-8bdb-7ef48aad3f03 devid 2 transid 3916 /dev/sdb [ 10.432180] btrfs: force zlib compression [ 10.433040] btrfs: enabling inode map caching [ 10.433892] btrfs: disk space caching is enabled [ 10.434930] btrfs: failed to read the system array on sdb [ 10.435945] btrfs: open_ctree failed fstab: UUID=1e7c18a4-02d6-44b1-8eaf-c01378009cd3 /storage/btrfs btrfs noatime,compress-force=zlib,space_cache,inode_cache 0 0 UUID=7f7be913-e359-400f-8bdb-7ef48aad3f03 /storage/btrfs2 btrfs noatime,compress-force=zlib,space_cache,inode_cache 0 0 The funny thing is that if i run btrfsck for one second on the first filesystem and then kill it with ctrl-c, then both filesystems can be mounted without any problems! I have this problem for many months, probably for all 3.x kernels and maybe a bit older, all git btrfs tools since at least late last year. [root@linuxserver ~/btrfs-progs]# btrfs fi show /dev/sdb Label: none uuid: 7f7be913-e359-400f-8bdb-7ef48aad3f03 Total devices 2 FS bytes used 1.54TB devid1 size 1.82TB used 1.04TB path /dev/sda devid2 size 1.82TB used 1.04TB path /dev/sdb Btrfs Btrfs v0.19 [root@linuxserver ~/btrfs-progs]# btrfs fi show /dev/sdf Label: none uuid: 1e7c18a4-02d6-44b1-8eaf-c01378009cd3 Total devices 4 FS bytes used 4.33TB devid5 size 1.82TB used 1.82TB path /dev/sdg devid4 size 1.82TB used 1.82TB path /dev/sdc devid3 size 1.82TB used 1.79TB path /dev/sdf devid1 size 1.82TB used 1.82TB path /dev/sdd Btrfs Btrfs v0.19 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs filesystems can only be mounted after an unclean shutdown if btrfsck is run and immediately killed!
On Παρασκευή, 8 Ιούνιος 2012 11:28:39 πμ, Tomasz Torcz wrote: On Fri, Jun 08, 2012 at 11:26:21AM +0300, Konstantinos Skarlatos wrote: Hi all, I have two multi-disk btrfs filesystems on a Arch linux 3.4.0 system. After a power failure, both filesystems refuse to mount Multi-device filesystem had to be first fully discovered by btrfs device scan. It is typically done from udev rules. Also, dracut does it in initramfs for quite a long time. (Added cc to btrfs list) You are right, i have forgotten to enable it (arch linux has a new rc.conf option for that), i will reboot in a few minutes to test it. Maybe it would be prudent to give a better error message when such a thing happens, or even make mount run btrfs device scan if it detects that a multi-drive fs is being mounted? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs filesystems can only be mounted after an unclean shutdown if btrfsck is run and immediately killed!
On Fri, Jun 08, 2012 at 11:26:21AM +0300, Konstantinos Skarlatos wrote: Hi all, I have two multi-disk btrfs filesystems on a Arch linux 3.4.0 system. After a power failure, both filesystems refuse to mount Multi-device filesystem had to be first fully discovered by btrfs device scan. It is typically done from udev rules. Also, dracut does it in initramfs for quite a long time. -- Tomasz TorczTo co nierealne -- tutaj jest normalne. xmpp: zdzich...@chrome.pl Ziomale na życie mają tu patenty specjalne. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New WARNING at fs/btrfs/extent_map.c:226 unpin_extent_cache with kernel 3.5-rc1
On Thu, Jun 07, 2012 at 09:58:44PM +0200, Catalin Iacob wrote: I just booted into a current git kernel and got 6 of these in about 1h30min of usage Yeah sorry! I've been working on this all week, I've just about got it nailed down, go back to 3.4 and you'll be fine for now. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: remove btrfsctl, btrfs-show and btrfs-vol from default build
Remove btrfsctl, btrfs-show and btrfs-vol from all target of Makefile. Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de --- Makefile |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index aaf1381..79e7a56 100644 --- a/Makefile +++ b/Makefile @@ -18,7 +18,7 @@ bindir = $(prefix)/bin LIBS=-luuid RESTORE_LIBS=-lz -progs = btrfsctl mkfs.btrfs btrfs-debug-tree btrfs-show btrfs-vol btrfsck \ +progs = mkfs.btrfs btrfs-debug-tree btrfsck \ btrfs btrfs-map-logical btrfs-image btrfs-zero-log btrfs-convert \ btrfs-find-root btrfs-restore btrfstune -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][btrfs-progs] add command btrfs filesystem info
Hi all the aim of this patch is to add the command btrfs filesystem info to show some filesystem information. Example: $ sudo btrfs-progs/btrfs filesystem info /mnt/test Path: /mnt/test Max ID: 4 UUID: 1c7e4ba6-aebc-4b39-90ef-c61315fb74d1 Num devices: 3 Dev ID: 2 UUID: cdbda186-735e-49cd-babc-e84fffdb4105 Bytes used: 511705088 Total bytes: 1667235840 Path: /dev/loop1 Dev ID: 3 UUID: a4cebf4e-f760-49a7-b960-142b2c14d9e6 Bytes used: 0 Total bytes: 490560 Path: /dev/loop2 Dev ID: 4 UUID: 61bd5825-f86e-4708-8a5b-f48764b5c3a5 Bytes used: 0 Total bytes: 490560 Path: /dev/loop3 I make that because it is not so trivial to get these information otherwise. To get the information, I made public and used: scrub_device_info() and scrub_fs_info(). So, I moved the functions to the file utils.c changing scrub_ to get_. I was inspired by a patch of Stefan Behrens. Next enhancements: - remove the root privileges to launch the command (require change in kernel space) - show also the label of the filesystem (same as above) Please review; as usual, comments are welcome. You can pull the changes from the reposiotry http://cassiopea.homelinux.net/git/btrfs-progs-unstable.git branch filesystem-info BR G.Baroncelli Makefile |8 +++--- cmds-filesystem.c | 59 cmds-scrub.c | 72 +- man/btrfs.8.in|8 ++ utils.c | 68 +++ utils.h |4 +++ 6 files changed, 145 insertions(+), 74 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Add btrfs filesystem info command.
Add btrfs filesystem info command, which is capable to show some btrfs filesystem information. --- cmds-filesystem.c | 59 + 1 file changed, 59 insertions(+) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 1f53d1c..9acc715 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -519,6 +519,64 @@ static int cmd_label(int argc, char **argv) return get_label(argv[1]); } +static const char * const cmd_info_usage[] = { + btrfs filesystem info path, + Get filesystem information, + NULL +}; + +static int cmd_info(int argc, char **argv){ + + struct btrfs_ioctl_fs_info_args fi_args; + struct btrfs_ioctl_dev_info_args*di_ret=NULL; + int ret,i; + charuuidbuf[37]; + int fd; + + if( argc != 2){ + fprintf(stderr, ERROR: missing argument\n); + return 102; + } + + fd = open_file_or_dir(argv[1]); + if( fd 0 ){ + fprintf(stderr, ERROR: cannot open '%s'\n, argv[1]); + return 103; + } + + ret = get_fs_info(fd, argv[1], fi_args, di_ret); + if( ret ){ + int e = errno; + fprintf(stderr, ERROR: cannot get information about '%s'\n, + argv[1]); + fprintf(stderr, ERROR: errno=%d, error=%s\n, + e, strerror(e) ); + close(fd); + if(di_ret != NULL ) + free(di_ret); + return 104; + } + + printf(Path: %s\n,argv[1]); + printf(Max ID: %llu\n, fi_args.max_id); + uuid_unparse(fi_args.fsid, uuidbuf); + printf(UUID: %s\n, uuidbuf); + + printf(\nNum devices: %llu\n, fi_args.num_devices); + for( i = 0 ; i fi_args.num_devices ; i++ ){ + printf( Dev ID: %llu\n, di_ret[i].devid); + uuid_unparse(di_ret[i].uuid, uuidbuf); + printf( UUID: %s\n, uuidbuf); + printf( Bytes used: %llu\n, di_ret[i].bytes_used ); + printf( Total bytes: %llu\n, di_ret[i].total_bytes ); + printf( Path: %s\n, di_ret[i].path); + } + + close(fd); + free(di_ret); + return 0; +} + const struct cmd_group filesystem_cmd_group = { filesystem_cmd_group_usage, NULL, { { df, cmd_df, cmd_df_usage, NULL, 0 }, @@ -528,6 +586,7 @@ const struct cmd_group filesystem_cmd_group = { { balance, cmd_balance, NULL, balance_cmd_group, 1 }, { resize, cmd_resize, cmd_resize_usage, NULL, 0 }, { label, cmd_label, cmd_label_usage, NULL, 0 }, + { info, cmd_info, cmd_info_usage, NULL, 0 }, { 0, 0, 0, 0, 0 }, } }; -- 1.7.10 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Btrfs-progs: make two utility functions globally available
This patch is ispired by a Stefan Behrens one. --- Makefile |8 +++ cmds-scrub.c | 72 ++ utils.c | 68 ++ utils.h |4 4 files changed, 78 insertions(+), 74 deletions(-) diff --git a/Makefile b/Makefile index 79818e6..0915021 100644 --- a/Makefile +++ b/Makefile @@ -4,7 +4,7 @@ CFLAGS = -g -O0 objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \ root-tree.o dir-item.o file-item.o inode-item.o \ inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \ - volumes.o utils.o btrfs-list.o btrfslabel.o repair.o + volumes.o utils.o btrfs-list.o btrfslabel.o repair.o common.o cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \ cmds-inspect.o cmds-balance.o @@ -39,8 +39,8 @@ all: version $(progs) manpages version: bash version.sh -btrfs: $(objects) btrfs.o help.o common.o $(cmds_objects) - $(CC) $(CFLAGS) -o btrfs btrfs.o help.o common.o $(cmds_objects) \ +btrfs: $(objects) btrfs.o help.o $(cmds_objects) + $(CC) $(CFLAGS) -o btrfs btrfs.o help.o $(cmds_objects) \ $(objects) $(LDFLAGS) $(LIBS) -lpthread calc-size: $(objects) calc-size.o @@ -52,7 +52,7 @@ btrfs-find-root: $(objects) find-root.o btrfs-restore: $(objects) restore.o gcc $(CFLAGS) -o btrfs-restore restore.o $(objects) $(LDFLAGS) $(LIBS) $(RESTORE_LIBS) -btrfsctl: $(objects) btrfsctl.o +btrfsctl: $(objects) btrfsctl.o $(CC) $(CFLAGS) -o btrfsctl btrfsctl.o $(objects) $(LDFLAGS) $(LIBS) btrfs-vol: $(objects) btrfs-vol.o diff --git a/cmds-scrub.c b/cmds-scrub.c index c4503f4..37a9890 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -967,74 +967,6 @@ static struct scrub_file_record *last_dev_scrub( return NULL; } -static int scrub_device_info(int fd, u64 devid, -struct btrfs_ioctl_dev_info_args *di_args) -{ - int ret; - - di_args-devid = devid; - memset(di_args-uuid, '\0', sizeof(di_args-uuid)); - - ret = ioctl(fd, BTRFS_IOC_DEV_INFO, di_args); - return ret ? -errno : 0; -} - -static int scrub_fs_info(int fd, char *path, - struct btrfs_ioctl_fs_info_args *fi_args, - struct btrfs_ioctl_dev_info_args **di_ret) -{ - int ret = 0; - int ndevs = 0; - int i = 1; - struct btrfs_fs_devices *fs_devices_mnt = NULL; - struct btrfs_ioctl_dev_info_args *di_args; - char mp[BTRFS_PATH_NAME_MAX + 1]; - - memset(fi_args, 0, sizeof(*fi_args)); - - ret = ioctl(fd, BTRFS_IOC_FS_INFO, fi_args); - if (ret errno == EINVAL) { - /* path is no mounted btrfs. try if it's a device */ - ret = check_mounted_where(fd, path, mp, sizeof(mp), - fs_devices_mnt); - if (!ret) - return -EINVAL; - if (ret 0) - return ret; - fi_args-num_devices = 1; - fi_args-max_id = fs_devices_mnt-latest_devid; - i = fs_devices_mnt-latest_devid; - memcpy(fi_args-fsid, fs_devices_mnt-fsid, BTRFS_FSID_SIZE); - close(fd); - fd = open_file_or_dir(mp); - if (fd 0) - return -errno; - } else if (ret) { - return -errno; - } - - if (!fi_args-num_devices) - return 0; - - di_args = *di_ret = malloc(fi_args-num_devices * sizeof(*di_args)); - if (!di_args) - return -errno; - - for (; i = fi_args-max_id; ++i) { - BUG_ON(ndevs = fi_args-num_devices); - ret = scrub_device_info(fd, i, di_args[ndevs]); - if (ret == -ENODEV) - continue; - if (ret) - return ret; - ++ndevs; - } - - BUG_ON(ndevs == 0); - - return 0; -} - int mkdir_p(char *path) { int i; @@ -1155,7 +1087,7 @@ static int scrub_start(int argc, char **argv, int resume) return 12; } - ret = scrub_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(fdmnt, path, fi_args, di_args); if (ret) { ERR(!do_quiet, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); @@ -1621,7 +1553,7 @@ static int cmd_scrub_status(int argc, char **argv) return 12; } - ret = scrub_fs_info(fdmnt, path, fi_args, di_args); + ret = get_fs_info(fdmnt, path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); diff --git a/utils.c b/utils.c index ee7fa1b..d916912 100644 --- a/utils.c +++
[PATCH 3/3] Add btrfs filesystem info man page.
Update the man page to document the btrfs filesystem info path command. --- man/btrfs.8.in |8 1 file changed, 8 insertions(+) diff --git a/man/btrfs.8.in b/man/btrfs.8.in index be478e0..6d96bf7 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -25,6 +25,8 @@ btrfs \- control a btrfs filesystem .PP \fBbtrfs\fP \fBfilesystem defrag\fP\fI [options] file|dir [file|dir...]\fP .PP +\fBbtrfs\fP \fBfilesystem info\fP\fI path\fP +.PP \fBbtrfs\fP \fBsubvolume find-new\fP\fI subvolume last_gen\fP .PP \fBbtrfs\fP \fBfilesystem balance\fP\fI path \fP @@ -154,6 +156,12 @@ The start position and the number of bytes to deframention can be specified by \ NOTE: defragmenting with kernels up to 2.6.37 will unlink COW-ed copies of data, don't use it if you use snapshots, have de-duplicated your data or made copies with \fBcp --reflink\fP. +.TP + +\fBfilesystem info\fP \fIpath\fR +Get filesystem info. +.TP + \fBsubvolume find-new\fR\fI subvolume last_gen\fR List the recently modified files in a subvolume, after \fIlast_gen\fR ID. .TP -- 1.7.10 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Add btrfs filesystem info man page.
What kind of info does it give? This is really unhelpful in a man page, it just tells the obvious. On Fri, Jun 8, 2012 at 3:12 PM, Goffredo Baroncelli kreij...@inwind.it wrote: Update the man page to document the btrfs filesystem info path command. --- man/btrfs.8.in | 8 1 file changed, 8 insertions(+) diff --git a/man/btrfs.8.in b/man/btrfs.8.in index be478e0..6d96bf7 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -25,6 +25,8 @@ btrfs \- control a btrfs filesystem .PP \fBbtrfs\fP \fBfilesystem defrag\fP\fI [options] file|dir [file|dir...]\fP .PP +\fBbtrfs\fP \fBfilesystem info\fP\fI path\fP +.PP \fBbtrfs\fP \fBsubvolume find-new\fP\fI subvolume last_gen\fP .PP \fBbtrfs\fP \fBfilesystem balance\fP\fI path \fP @@ -154,6 +156,12 @@ The start position and the number of bytes to deframention can be specified by \ NOTE: defragmenting with kernels up to 2.6.37 will unlink COW-ed copies of data, don't use it if you use snapshots, have de-duplicated your data or made copies with \fBcp --reflink\fP. +.TP + +\fBfilesystem info\fP \fIpath\fR +Get filesystem info. +.TP + \fBsubvolume find-new\fR\fI subvolume last_gen\fR List the recently modified files in a subvolume, after \fIlast_gen\fR ID. .TP -- 1.7.10 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: keep inode pinned when compressing writes
A user reported lots of problems using compression on the new code and it turns out part of the problem was that igrab() was failing when we added a new ordered extent. This is because when writing out an inode under compression we immediately return without actually doing anything to the pages, and then in another thread at some point down the line actually do the ordered dance. The problem is between the point that we start writeback and we actually add the ordered extent we could be trying to reclaim the inode, which makes igrab() return NULL. So we need to do an igrab() when we create the async extent and then drop it when we are done with it. This makes sure we stay pinned in memory until the ordered extent can get a reference on it and we are good to go. With this patch we no longer panic in btrfs_finish_ordered_io(). Thanks, Signed-off-by: Josef Bacik jo...@redhat.com --- fs/btrfs/inode.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 51b9971..293c018 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -986,8 +986,10 @@ static noinline void async_cow_start(struct btrfs_work *work) compress_file_range(async_cow-inode, async_cow-locked_page, async_cow-start, async_cow-end, async_cow, num_added); - if (num_added == 0) + if (num_added == 0) { + iput(async_cow-inode); async_cow-inode = NULL; + } } /* @@ -1020,6 +1022,8 @@ static noinline void async_cow_free(struct btrfs_work *work) { struct async_cow *async_cow; async_cow = container_of(work, struct async_cow, work); + if (async_cow-inode) + iput(async_cow-inode); kfree(async_cow); } @@ -1038,7 +1042,7 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, while (start end) { async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS); BUG_ON(!async_cow); /* -ENOMEM */ - async_cow-inode = inode; + async_cow-inode = igrab(inode); async_cow-root = root; async_cow-locked_page = locked_page; async_cow-start = start; -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: call filemap_fdatawrite twice for compression
I removed this in an earlier commit and I was wrong. Because compression can return from filemap_fdatawrite() without having actually set any of it's pages as writeback() it can make filemap_fdatawait() do essentially nothing, and then we won't find any ordered extents because they may not have been created yet. So not only does this make fsync() completely useless, but it will also screw up if you truncate on a non-page aligned offset since we zero out the end and then wait on ordered extents and then call drop caches. We can drop the cache before the io completes and then we try to unpin the extent we just wrote we won't find it and everything goes sideways. So fix this by putting it back and put a giant comment there to keep me from trying to remove it in the future. Thanks, Signed-off-by: Josef Bacik jo...@redhat.com --- fs/btrfs/ordered-data.c | 24 +++- 1 files changed, 23 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 9e138cd..49cacdb 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -611,6 +611,7 @@ void btrfs_start_ordered_extent(struct inode *inode, */ void btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len) { + struct btrfs_root *root = BTRFS_I(inode)-root; u64 end; u64 orig_end; struct btrfs_ordered_extent *ordered; @@ -627,7 +628,28 @@ void btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len) /* start IO across the range first to instantiate any delalloc * extents */ - filemap_write_and_wait_range(inode-i_mapping, start, orig_end); + filemap_fdatawrite_range(inode-i_mapping, start, orig_end); + + /* +* So with compression we will find and lock a dirty page and clear the +* first one as dirty, setup an async extent, and immediately return +* with the entire range locked but with nobody actually marked with +* writeback. So we can't just filemap_write_and_wait_range() and +* expect it to work since it will just kick off a thread to do the +* actual work. So we need to call filemap_fdatawrite_range _again_ +* since it will wait on the page lock, which won't be unlocked until +* after the pages have been marked as writeback and so we're good to go +* from there. We have to do this otherwise we'll miss the ordered +* extents and that results in badness. Please Josef, do not think you +* know better and pull this out at some point in the future, it is +* right and you are wrong. +*/ + if (btrfs_test_opt(root, COMPRESS) || + (BTRFS_I(inode)-force_compress) || + (BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS)) + filemap_fdatawrite_range(inode-i_mapping, start, orig_end); + + filemap_fdatawait_range(inode-i_mapping, start, orig_end); end = orig_end; found = 0; -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moving top level to a subvolume
On 06/08/2012 09:24 PM, Matthew Hawn wrote: I just converted my root filesystem to btrfs with btrfs-convert. However, since I am running Ubuntu, I would like to have the same subvolume structure as a default install,. How do I move the top-level subvolume (where all my files currently are) to another subvolume? Just snapshot the root subvol and continue working in the snapshot. Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Massive metadata size increase after upgrade from 3.2.18 to 3.4.1
Hello, Before the upgrade (on 3.2.18): Metadata, DUP: total=9.38GB, used=5.94GB After the FS has been mounted once with 3.4.1: Data: total=3.44TB, used=2.67TB System, DUP: total=8.00MB, used=412.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=84.38GB, used=5.94GB Where did my 75 GB of free space just went? -- With respect, Roman ~~~ Stallman had a printer, with code he could not see. So he began to tinker, and set the software free. signature.asc Description: PGP signature
[RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
From: Kirill A. Shutemov kirill.shute...@linux.intel.com Sorry for resend. Original mail had too long cc list. There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com --- fs/9p/v9fs.c |5 + fs/adfs/super.c |5 + fs/affs/super.c |5 + fs/afs/super.c |5 + fs/befs/linuxvfs.c |5 + fs/bfs/inode.c |5 + fs/btrfs/extent_io.c |6 ++ fs/btrfs/inode.c |5 + fs/ceph/super.c |5 + fs/cifs/cifsfs.c |5 + fs/coda/inode.c |5 + fs/ecryptfs/main.c |6 ++ fs/efs/super.c |5 + fs/exofs/super.c |5 + fs/ext2/super.c |5 + fs/ext3/super.c |5 + fs/ext4/super.c |5 + fs/fat/inode.c |5 + fs/freevxfs/vxfs_super.c |5 + fs/fuse/inode.c |6 ++ fs/hfs/super.c |6 ++ fs/hfsplus/super.c |6 ++ fs/hpfs/super.c |5 + fs/hugetlbfs/inode.c |5 + fs/isofs/inode.c |5 + fs/jffs2/super.c |6 ++ fs/jfs/super.c |6 ++ fs/logfs/inode.c |5 + fs/minix/inode.c |5 + fs/ncpfs/inode.c |5 + fs/nfs/inode.c |5 + fs/nilfs2/super.c|6 ++ fs/ntfs/super.c |6 ++ fs/ocfs2/dlmfs/dlmfs.c |5 + fs/ocfs2/super.c |5 + fs/openpromfs/inode.c|5 + fs/qnx4/inode.c |5 + fs/qnx6/inode.c |5 + fs/reiserfs/super.c |5 + fs/romfs/super.c |5 + fs/squashfs/super.c |5 + fs/super.c |6 -- fs/sysv/inode.c |5 + fs/ubifs/super.c |6 ++ fs/udf/super.c |5 + fs/ufs/super.c |5 + fs/xfs/xfs_super.c |5 + 47 files changed, 240 insertions(+), 6 deletions(-) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index b85efa7..392c5da 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -560,6 +560,11 @@ static int v9fs_init_inode_cache(void) */ static void v9fs_destroy_inode_cache(void) { + /* +* Make sure all delayed rcu free inodes are flushed before we +* destroy cache. +*/ + rcu_barrier(); kmem_cache_destroy(v9fs_inode_cache); } diff --git a/fs/adfs/super.c b/fs/adfs/super.c index 06fdcc9..f137165 100644 --- a/fs/adfs/super.c +++ b/fs/adfs/super.c @@ -276,6 +276,11 @@ static int init_inodecache(void) static void destroy_inodecache(void) { + /* +* Make sure all delayed rcu free inodes are flushed before we +* destroy cache. +*/ + rcu_barrier(); kmem_cache_destroy(adfs_inode_cachep); } diff --git a/fs/affs/super.c b/fs/affs/super.c index 0782653..4fe18a8 100644 --- a/fs/affs/super.c +++ b/fs/affs/super.c @@ -129,6 +129,11 @@ static int init_inodecache(void) static void destroy_inodecache(void) { + /* +* Make sure all delayed rcu free inodes are flushed before we +* destroy cache. +*/ + rcu_barrier(); kmem_cache_destroy(affs_inode_cachep); } diff --git a/fs/afs/super.c b/fs/afs/super.c index f02b31e..af69050 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -123,6 +123,11 @@ void __exit afs_fs_exit(void) BUG(); } + /* +* Make sure all delayed rcu free inodes are flushed before we +* destroy cache. +*/ + rcu_barrier(); kmem_cache_destroy(afs_inode_cachep); _leave(); } diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c index e18da23..02f3331 100644 --- a/fs/befs/linuxvfs.c +++ b/fs/befs/linuxvfs.c @@ -454,6 +454,11 @@ befs_init_inodecache(void) static void befs_destroy_inodecache(void) { + /* +* Make sure all delayed rcu free inodes are flushed before we +* destroy cache. +*/ + rcu_barrier(); kmem_cache_destroy(befs_inode_cachep); } diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c index 9870417..d5fc598 100644 --- a/fs/bfs/inode.c +++ b/fs/bfs/inode.c @@ -280,6 +280,11 @@ static int init_inodecache(void) static void destroy_inodecache(void) { + /* +* Make sure all delayed rcu free inodes are flushed before we +* destroy cache. +*/ + rcu_barrier(); kmem_cache_destroy(bfs_inode_cachep); } diff --git a/fs/btrfs/extent_io.c
Re: [RFC, PATCH] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 08, 2012 at 02:43:58PM -0700, Linus Torvalds wrote: On Fri, Jun 8, 2012 at 2:28 PM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. I think we should just delete it. kmem_cache_destroy() (at least for SLUB) already has: if (s-flags SLAB_DESTROY_BY_RCU) rcu_barrier(); in it. But I think it's too late - it gets called *after* we do kmem_cache_close(), and I get the feeling that we should do it before. Shouldn't that be sufficient? And if other slab allocators don't have this, we should add it to them too. Hmm? When I tried SLAB_DESTROY_BY_RCU I've got problem: [ 36.687999] Pid: 3455, comm: rmmod Not tainted 3.5.0-rc1-00130-g48d212a-dirty #40 [ 36.688001] Call Trace: [ 36.688012] [8113367a] slab_err+0xaa/0xd0 [ 36.688020] [8113515a] ? __kmalloc+0x10a/0x110 [ 36.688026] [8113647d] kmem_cache_destroy+0x1dd/0x420 [ 36.688056] [a00f0f25] btrfs_destroy_cachep+0x15/0x60 [btrfs] [ 36.688076] [a013cac3] exit_btrfs_fs+0x9/0x3a [btrfs] [ 36.688083] [810c324e] sys_delete_module+0x16e/0x2f0 [ 36.688090] [8128cf29] ? lockdep_sys_exit_thunk+0x35/0x67 [ 36.688097] [8161eba6] system_call_fastpath+0x1a/0x1f [ 36.688111] Pid: 3455, comm: rmmod Not tainted 3.5.0-rc1-00130-g48d212a-dirty #40 [ 36.688114] Call Trace: [ 36.688119] [811365ee] kmem_cache_destroy+0x34e/0x420 [ 36.688143] [a00f0f25] btrfs_destroy_cachep+0x15/0x60 [btrfs] [ 36.688162] [a013cac3] exit_btrfs_fs+0x9/0x3a [btrfs] [ 36.688168] [810c324e] sys_delete_module+0x16e/0x2f0 [ 36.688174] [8128cf29] ? lockdep_sys_exit_thunk+0x35/0x67 [ 36.688179] [8161eba6] system_call_fastpath+0x1a/0x1f IIUC, moving rcu_barrier() up should help, but I can't say that I fully understand SLAB_DESTROY_BY_RCU semantics. -- Kirill A. Shutemov signature.asc Description: Digital signature
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, 9 Jun 2012 00:41:03 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. What an unpleasant patch. Is final-process-exiting-ipc-namespace a sufficiently high-frequency operation to justify the change? I don't really understand what's going on here. Are you saying that there is some filesystem against which we run deactivate_locked_super() during exit_group(), and that this filesystem doesn't use rcu-freeing of inodes? The description needs this level of detail, please. The implementation would be less unpleasant if we could do the rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that without adding a dedicated slab flag, which would require editing all the filesystems anyway. (kmem_cache_destroy() already has an rcu_barrier(). Can we do away with the private rcu games in the vfs and switch to SLAB_DESTROY_BY_RCU?) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 8, 2012 at 3:00 PM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: IIUC, moving rcu_barrier() up should help, but I can't say that I fully understand SLAB_DESTROY_BY_RCU semantics. .. hmm. I think you may be right. Even if we do move it up, we probably shouldn't use it. We don't even want SLAB_DESTROY_BY_RCU, since we do the delayed RCU free for other reasons anyway, so it would duplicate the RCU delaying and cause problems. I forgot about that little complication. We could have a separate RCU_BARRIER_ON_DESTROY thing, but that's just silly too. Maybe your patch is the right thing. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote: On Sat, 9 Jun 2012 00:41:03 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. What an unpleasant patch. Is final-process-exiting-ipc-namespace a sufficiently high-frequency operation to justify the change? I don't really understand what's going on here. Are you saying that there is some filesystem against which we run deactivate_locked_super() during exit_group(), and that this filesystem doesn't use rcu-freeing of inodes? The description needs this level of detail, please. I think the rcu_barrier() is in wrong place. We need it to safely destroy inode cache. deactivate_locked_super() is part of umount() path, but all filesystems I've checked have inode cache for whole filesystem, not per-mount. The implementation would be less unpleasant if we could do the rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that without adding a dedicated slab flag, which would require editing all the filesystems anyway. I think rcu_barrier() for all kmem_cache_destroy() would be too expensive. (kmem_cache_destroy() already has an rcu_barrier(). Can we do away with the private rcu games in the vfs and switch to SLAB_DESTROY_BY_RCU?) -- Kirill A. Shutemov signature.asc Description: Digital signature
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, Jun 09, 2012 at 01:14:46AM +0300, Kirill A. Shutemov wrote: The implementation would be less unpleasant if we could do the rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that without adding a dedicated slab flag, which would require editing all the filesystems anyway. I think rcu_barrier() for all kmem_cache_destroy() would be too expensive. You've got to be kidding. Please, show us the codepath that would be hot enough to make that too expensive and would contain kmem_cache_destroy(). Note that module unload is *not* a hot path - not on any even remotely sane use. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 08, 2012 at 03:06:20PM -0700, Linus Torvalds wrote: .. hmm. I think you may be right. Even if we do move it up, we probably shouldn't use it. We don't even want SLAB_DESTROY_BY_RCU, since we do the delayed RCU free for other reasons anyway, so it would duplicate the RCU delaying and cause problems. I forgot about that little complication. We could have a separate RCU_BARRIER_ON_DESTROY thing, but that's just silly too. Why not make that rcu_barrier() in there unconditional? Where are we creating/destroying caches often enough for that to become a problem? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, 9 Jun 2012 01:14:46 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote: On Sat, 9 Jun 2012 00:41:03 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. What an unpleasant patch. Is final-process-exiting-ipc-namespace a sufficiently high-frequency operation to justify the change? This, please. I don't really understand what's going on here. Are you saying that there is some filesystem against which we run deactivate_locked_super() during exit_group(), and that this filesystem doesn't use rcu-freeing of inodes? The description needs this level of detail, please. You still haven't explained where this deactivate_locked_super() call is coming from. Oh well. I think the rcu_barrier() is in wrong place. We need it to safely destroy inode cache. deactivate_locked_super() is part of umount() path, but all filesystems I've checked have inode cache for whole filesystem, not per-mount. Well from a design perspective, putting the rcu_barrier() in the vfs is the *correct* place. Individual filesystems shouldn't be hard-coding knowledge about vfs internal machinery. A neater implementation might be to add a kmem_cache* argument to unregister_filesystem(). If that is non-NULL, unregister_filesystem() does the rcu_barrier() and destroys the cache. That way we get to delete (rather than add) a bunch of code from all filesystems and new and out-of-tree filesystems cannot forget to perform the rcu_barrier(). The implementation would be less unpleasant if we could do the rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that without adding a dedicated slab flag, which would require editing all the filesystems anyway. I think rcu_barrier() for all kmem_cache_destroy() would be too expensive. That is not what I proposed. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote: A neater implementation might be to add a kmem_cache* argument to unregister_filesystem(). If that is non-NULL, unregister_filesystem() does the rcu_barrier() and destroys the cache. That way we get to delete (rather than add) a bunch of code from all filesystems and new and out-of-tree filesystems cannot forget to perform the rcu_barrier(). There's often enough more than one cache, so that one is no-go. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 8, 2012 at 3:23 PM, Al Viro v...@zeniv.linux.org.uk wrote: Note that module unload is *not* a hot path - not on any even remotely sane use. Actually, I think we've had distributions that basically did a load pretty much everything, and let God sort it out approach to modules. I know some people *have* actually worried about module load/unload performance. Whether it is remotely sane I'm not going to argue for, but .. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote: On Sat, 9 Jun 2012 01:14:46 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote: On Sat, 9 Jun 2012 00:41:03 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. What an unpleasant patch. Is final-process-exiting-ipc-namespace a sufficiently high-frequency operation to justify the change? This, please. ALT Linux guys use a namespaces (including IPC namespace) to create sandbox[1] for build system and other use cases. The build system calls the sandboxing wrapper frequently on setup building chroot and build prepare. This kind of delays affect timings significantly. [1] http://git.altlinux.org/people/ldv/packages/hasher-priv.git I don't really understand what's going on here. Are you saying that there is some filesystem against which we run deactivate_locked_super() during exit_group(), and that this filesystem doesn't use rcu-freeing of inodes? The description needs this level of detail, please. You still haven't explained where this deactivate_locked_super() call is coming from. Oh well. Call Trace: [81443a2a] schedule+0x3a/0x50 [81441e7d] schedule_timeout+0x1cd/0x2c0 [811f8f87] ? mqueue_destroy_inode+0x17/0x20 [81443044] wait_for_common+0xc4/0x160 [8107af50] ? try_to_wake_up+0x2a0/0x2a0 [810d63b0] ? call_rcu_sched+0x10/0x20 [810d63a0] ? call_rcu_bh+0x20/0x20 [81443188] wait_for_completion+0x18/0x20 [810d5a9b] _rcu_barrier.clone.31+0x9b/0xb0 [810d5ac0] rcu_barrier_sched+0x10/0x20 [810d5ad9] rcu_barrier+0x9/0x10 [811602c9] deactivate_locked_super+0x49/0x90 [81160d35] deactivate_super+0x45/0x60 [8117ad74] mntput_no_expire+0x104/0x150 [8117addc] mntput+0x1c/0x30 [8117cda7] kern_unmount+0x27/0x30 [811faeb0] mq_put_mnt+0x10/0x20 [811fb57f] put_ipc_ns+0x3f/0xb0 [81071f5c] free_nsproxy+0x3c/0xa0 [81072143] switch_task_namespaces+0x33/0x40 [8107215b] exit_task_namespaces+0xb/0x10 [8104f154] do_exit+0x4b4/0x8a0 [8104f7e3] do_group_exit+0x53/0xc0 [8104f862] sys_exit_group+0x12/0x20 [8144c939] system_call_fastpath+0x16/0x1b -- Kirill A. Shutemov signature.asc Description: Digital signature
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, 9 Jun 2012 02:31:27 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: On Fri, Jun 08, 2012 at 03:31:20PM -0700, Andrew Morton wrote: On Fri, 8 Jun 2012 23:27:34 +0100 Al Viro v...@zeniv.linux.org.uk wrote: On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote: A neater implementation might be to add a kmem_cache* argument to unregister_filesystem(). If that is non-NULL, unregister_filesystem() does the rcu_barrier() and destroys the cache. That way we get to delete (rather than add) a bunch of code from all filesystems and new and out-of-tree filesystems cannot forget to perform the rcu_barrier(). There's often enough more than one cache, so that one is no-go. kmem_cache** ;) Which filesystems have multiple inode caches? Multiple inode caches? No. Multiple caches with call_rcu() free? See btrfs or gfs2. OK. But for those non-inode caches, the rcu treatment is private to the filesystem. Hence it is appropriate that the filesystem call rcu_barrier() for those caches. But in the case of the inode caches, the rcu treatment is a vfs thing, so it is the vfs which should perform the rcu_barrier(). This is a red herring - those non-inode caches have nothing to do with the issue we're dicussing. So how about open-coding the rcu_barrier() in btrfs and gfs2 for the non-inode caches (which is the appropriate place), and hand the inode cache over to the vfs for treatment (which is the appropriate place). The downside is that btrfs and gfs2 will do an extra rcu_barrier() at umount time. Shrug. If they really want to super-optimise that, they can skip the private rcu_barrier() call and assume that the vfs will be doing it. Not a good idea, IMO. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 8, 2012 at 4:37 PM, Andrew Morton a...@linux-foundation.org wrote: So how about open-coding the rcu_barrier() in btrfs and gfs2 for the non-inode caches (which is the appropriate place), and hand the inode cache over to the vfs for treatment (which is the appropriate place). The thing is, none of the inode caches are really up to the VFS. They are per-filesystem caches, that just *embed* an inode as part of the bigger ext4_inode or whatever. But apart from the fact that the embedded inode requires them to then use the proper call_rcu() stuff to do the delayed free, they really are pretty much filesystem data structures. The VFS layer can neither free them or allocate them, since the VFS layer doesn't even know how big the structures are, or where the inodes are embedded, or how to initialize them (or even when to allocate them). Of course, if you just mean having a VFS wrapper that does static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep) { rcu_barrier(); kmem_cache_destroy(cachep); } then we could do that. Not much better than what Kirill's patch did, but at least we could have that comment in just one single place. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, 8 Jun 2012 16:46:47 -0700 Linus Torvalds torva...@linux-foundation.org wrote: Of course, if you just mean having a VFS wrapper that does static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep) { rcu_barrier(); kmem_cache_destroy(cachep); } then we could do that. Not much better than what Kirill's patch did, but at least we could have that comment in just one single place. That's conceptually what I meant. But it has the problem that new and out-of-tree filesystems might forget to do it. Which is why I suggest adding a kmem_cache* argument to unregister_filesystem() for this. It's a bit awkward, and the fs can pass in NULL if it knows what it's doing. But it's reliable. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problem re-adding original mount block device on multi-device fs
On 06/08/2012 06:42 AM, Lennert Buytenhek wrote: (please CC on replies, I'm not subscribed to the list) Hi! This fails: # mkfs.btrfs /dev/sd[bcde] # mount /dev/sdb /mnt/x # btrfs device delete /dev/sdb /mnt/x # btrfs device add /dev/sdb /mnt/x /dev/sdb is mounted # It seems that I have to unmount and remount the fs using another constituent block device before I can re-add the original block device that the filesystem was mounted with: # umount /mnt/x # mount /dev/sdb /mnt/x mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so # mount /dev/sdc /mnt/x # btrfs device add /dev/sdb /mnt/x # This is on up-to-date F17, kernel 3.4.0(-1.fc17) with btrfs-progs 0.19(-18.fc17). Give a look to this patche: [PATCH 2/2] Btrfs: fix wrong the mount information in /proc This should address (even without solving) this kind of problem. thanks, Lennert -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html