Re: [PATCH v5 1/3] Btrfs-progs: move open_file_or_dir() to utils.c

2012-06-08 Thread Stefan Behrens
On Thu, 07 Jun 2012 21:38:25 +0200, Goffredo Baroncelli wrote:
 Hi Stefan,
 
 On 05/25/2012 04:07 PM, Stefan Behrens wrote:
 This is a preparation step to add support for device stats. The definition
 of the function open_file_or_dir() is moved from common.c to utils.c in
 order to be able to share some common code between scrub and the device
 stats in the following step. That common code uses open_file_or_dir().
 Since open_file_or_dir() makes use of the function dirfd(3), the required
 XOPEN version was raised from 6 to 7.

 Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
 ---
  Makefile |4 ++--
  btrfsctl.c   |   28 
  cmds-balance.c   |1 +
  cmds-inspect.c   |1 +
  cmds-subvolume.c |1 +
  commands.h   |3 ---
  common.c |   46 --
  utils.c  |   31 +--
  utils.h  |3 +++
  9 files changed, 37 insertions(+), 81 deletions(-)

 diff --git a/Makefile b/Makefile
 index 79818e6..fe2b432 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -39,8 +39,8 @@ all: version $(progs) manpages
  version:
  bash version.sh
  
 -btrfs: $(objects) btrfs.o help.o common.o $(cmds_objects)
 -$(CC) $(CFLAGS) -o btrfs btrfs.o help.o common.o $(cmds_objects) \
 +btrfs: $(objects) btrfs.o help.o $(cmds_objects)
 +$(CC) $(CFLAGS) -o btrfs btrfs.o help.o $(cmds_objects) \
  $(objects) $(LDFLAGS) $(LIBS) -lpthread
  
  calc-size: $(objects) calc-size.o
 diff --git a/btrfsctl.c b/btrfsctl.c
 index d45e2a7..f0584f3 100644
 --- a/btrfsctl.c
 +++ b/btrfsctl.c
 @@ -63,34 +63,6 @@ static void print_usage(void)
  exit(1);
  }
  
 -static int open_file_or_dir(const char *fname)
 -{
 -int ret;
 -struct stat st;
 -DIR *dirstream;
 -int fd;
 -
 -ret = stat(fname, st);
 -if (ret  0) {
 -perror(stat:);
 -exit(1);
 -}
 -if (S_ISDIR(st.st_mode)) {
 -dirstream = opendir(fname);
 -if (!dirstream) {
 -perror(opendir);
 -exit(1);
 -}
 -fd = dirfd(dirstream);
 -} else {
 -fd = open(fname, O_RDWR);
 -}
 -if (fd  0) {
 -perror(open);
 -exit(1);
 -}
 -return fd;
 -}
 
 I suppose that you want to remove open_file_or_dir() from btrfsctl.c and
 use the one provided by utils.c (or common.c). However I have to point
 out that these two function are a bit different: the first one does
 exit() in case of error; the second one returns an error.
 
 In any case I think that it is time to deprecate and remove btrfsctl.
 
 BR
 Goffredo

Hi Goffredo,

Yes, I was aware of this difference but did not want to spend _any_
additional effort in the deprecated btrfsctl tool. Maybe it would have
been better to just disable its default build in the Makefile.

Does anybody have any objections against removing the btrfsctl,
btrfs-show and btrfs-vol tools from the all target of Makefile?


  int main(int ac, char **av)
  {
  char *fname = NULL;
 diff --git a/cmds-balance.c b/cmds-balance.c
 index 38a7426..5793b5c 100644
 --- a/cmds-balance.c
 +++ b/cmds-balance.c
 @@ -26,6 +26,7 @@
  #include ctree.h
  #include ioctl.h
  #include volumes.h
 +#include utils.h
  
  #include commands.h
  
 diff --git a/cmds-inspect.c b/cmds-inspect.c
 index 2f0228f..7a8785b 100644
 --- a/cmds-inspect.c
 +++ b/cmds-inspect.c
 @@ -22,6 +22,7 @@
  
  #include kerncompat.h
  #include ioctl.h
 +#include utils.h
  
  #include commands.h
  
 diff --git a/cmds-subvolume.c b/cmds-subvolume.c
 index 950fa8f..8ecd3f4 100644
 --- a/cmds-subvolume.c
 +++ b/cmds-subvolume.c
 @@ -26,6 +26,7 @@
  
  #include kerncompat.h
  #include ioctl.h
 +#include utils.h
  
  #include commands.h
  
 diff --git a/commands.h b/commands.h
 index a303a50..aea4cb1 100644
 --- a/commands.h
 +++ b/commands.h
 @@ -79,9 +79,6 @@ void help_ambiguous_token(const char *arg, const struct 
 cmd_group *grp);
  
  void help_command_group(const struct cmd_group *grp, int argc, char **argv);
  
 -/* common.c */
 -int open_file_or_dir(const char *fname);
 -
  extern const struct cmd_group subvolume_cmd_group;
  extern const struct cmd_group filesystem_cmd_group;
  extern const struct cmd_group balance_cmd_group;
 diff --git a/common.c b/common.c
 deleted file mode 100644
 index 03f6570..000
 --- a/common.c
 +++ /dev/null
 @@ -1,46 +0,0 @@
 -/*
 - * This program is free software; you can redistribute it and/or
 - * modify it under the terms of the GNU General Public
 - * License v2 as published by the Free Software Foundation.
 - *
 - * This program is distributed in the hope that it will be useful,
 - * but WITHOUT ANY WARRANTY; without even the implied warranty of
 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 - * General Public License for more details.
 - *
 - * You should have received a copy of the GNU General Public
 - * 

btrfs filesystems can only be mounted after an unclean shutdown if btrfsck is run and immediately killed!

2012-06-08 Thread Konstantinos Skarlatos

Hi all,
I have two multi-disk btrfs filesystems on a Arch linux 3.4.0 system. 
After a power failure, both filesystems refuse to mount


[   10.402284] Btrfs loaded
[   10.402714] device fsid 1e7c18a4-02d6-44b1-8eaf-c01378009cd3 devid 4 
transid 65282 /dev/sdc

[   10.403108] btrfs: force zlib compression
[   10.403130] btrfs: enabling inode map caching
[   10.403152] btrfs: disk space caching is enabled
[   10.403377] btrfs: failed to read the system array on sdc
[   10.403557] btrfs: open_ctree failed
[   10.431763] device fsid 7f7be913-e359-400f-8bdb-7ef48aad3f03 devid 2 
transid 3916 /dev/sdb

[   10.432180] btrfs: force zlib compression
[   10.433040] btrfs: enabling inode map caching
[   10.433892] btrfs: disk space caching is enabled
[   10.434930] btrfs: failed to read the system array on sdb
[   10.435945] btrfs: open_ctree failed


fstab:

UUID=1e7c18a4-02d6-44b1-8eaf-c01378009cd3 /storage/btrfs btrfs 
noatime,compress-force=zlib,space_cache,inode_cache 0 0
UUID=7f7be913-e359-400f-8bdb-7ef48aad3f03 /storage/btrfs2 btrfs 
noatime,compress-force=zlib,space_cache,inode_cache 0 0



The funny thing is that if i run btrfsck for one second on the first 
filesystem and then kill it with ctrl-c, then both filesystems can be 
mounted without any problems!


I have this problem for many months, probably for all 3.x kernels and 
maybe a bit older, all git btrfs tools since at least late last year.


 [root@linuxserver ~/btrfs-progs]# btrfs fi show /dev/sdb
Label: none  uuid: 7f7be913-e359-400f-8bdb-7ef48aad3f03
Total devices 2 FS bytes used 1.54TB
devid1 size 1.82TB used 1.04TB path /dev/sda
devid2 size 1.82TB used 1.04TB path /dev/sdb

Btrfs Btrfs v0.19
 [root@linuxserver ~/btrfs-progs]# btrfs fi show /dev/sdf
Label: none  uuid: 1e7c18a4-02d6-44b1-8eaf-c01378009cd3
Total devices 4 FS bytes used 4.33TB
devid5 size 1.82TB used 1.82TB path /dev/sdg
devid4 size 1.82TB used 1.82TB path /dev/sdc
devid3 size 1.82TB used 1.79TB path /dev/sdf
devid1 size 1.82TB used 1.82TB path /dev/sdd

Btrfs Btrfs v0.19
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs filesystems can only be mounted after an unclean shutdown if btrfsck is run and immediately killed!

2012-06-08 Thread Konstantinos Skarlatos

On Παρασκευή, 8 Ιούνιος 2012 11:28:39 πμ, Tomasz Torcz wrote:

On Fri, Jun 08, 2012 at 11:26:21AM +0300, Konstantinos Skarlatos wrote:

Hi all,
I have two multi-disk btrfs filesystems on a Arch linux 3.4.0
system. After a power failure, both filesystems refuse to mount


   Multi-device filesystem had to be first fully discovered by
btrfs device scan.  It is typically done from udev rules. Also,
dracut does it in initramfs for quite a long time.


(Added cc to btrfs list)

You are right, i have forgotten to enable it (arch linux has a new 
rc.conf option for that), i will reboot in a few minutes to test it.
Maybe it would be prudent to give a better error message when such a 
thing happens, or even make mount run btrfs device scan if it detects 
that a multi-drive fs is being mounted?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs filesystems can only be mounted after an unclean shutdown if btrfsck is run and immediately killed!

2012-06-08 Thread Tomasz Torcz
On Fri, Jun 08, 2012 at 11:26:21AM +0300, Konstantinos Skarlatos wrote:
 Hi all,
 I have two multi-disk btrfs filesystems on a Arch linux 3.4.0
 system. After a power failure, both filesystems refuse to mount

  Multi-device filesystem had to be first fully discovered by
btrfs device scan.  It is typically done from udev rules. Also,
dracut does it in initramfs for quite a long time.

-- 
Tomasz TorczTo co nierealne -- tutaj jest normalne.
xmpp: zdzich...@chrome.pl  Ziomale na życie mają tu patenty specjalne.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New WARNING at fs/btrfs/extent_map.c:226 unpin_extent_cache with kernel 3.5-rc1

2012-06-08 Thread Josef Bacik
On Thu, Jun 07, 2012 at 09:58:44PM +0200, Catalin Iacob wrote:
 I just booted into a current git kernel and got 6 of these in about
 1h30min of usage
 

Yeah sorry!  I've been working on this all week, I've just about got it nailed
down, go back to 3.4 and you'll be fine for now.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: remove btrfsctl, btrfs-show and btrfs-vol from default build

2012-06-08 Thread Stefan Behrens
Remove btrfsctl, btrfs-show and btrfs-vol from all target of Makefile.

Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
---
 Makefile |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index aaf1381..79e7a56 100644
--- a/Makefile
+++ b/Makefile
@@ -18,7 +18,7 @@ bindir = $(prefix)/bin
 LIBS=-luuid
 RESTORE_LIBS=-lz
 
-progs = btrfsctl mkfs.btrfs btrfs-debug-tree btrfs-show btrfs-vol btrfsck \
+progs = mkfs.btrfs btrfs-debug-tree btrfsck \
btrfs btrfs-map-logical btrfs-image btrfs-zero-log btrfs-convert \
btrfs-find-root btrfs-restore btrfstune
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][btrfs-progs] add command btrfs filesystem info

2012-06-08 Thread Goffredo Baroncelli
Hi all

the aim of this patch is to add the command btrfs filesystem info to 
show some filesystem information. Example:

$ sudo btrfs-progs/btrfs filesystem info /mnt/test
Path: /mnt/test
Max ID: 4
UUID: 1c7e4ba6-aebc-4b39-90ef-c61315fb74d1

Num devices: 3
  Dev ID: 2
  UUID: cdbda186-735e-49cd-babc-e84fffdb4105
  Bytes used: 511705088
  Total bytes: 1667235840
  Path: /dev/loop1
  Dev ID: 3
  UUID: a4cebf4e-f760-49a7-b960-142b2c14d9e6
  Bytes used: 0
  Total bytes: 490560
  Path: /dev/loop2
  Dev ID: 4
  UUID: 61bd5825-f86e-4708-8a5b-f48764b5c3a5
  Bytes used: 0
  Total bytes: 490560
  Path: /dev/loop3

I make that because it is not so trivial to get these information otherwise.

To get the information, I made public and used: scrub_device_info() and 
scrub_fs_info(). So, I moved the functions to the file utils.c changing 
scrub_ to get_. I was inspired by a patch of Stefan Behrens.

Next enhancements:
- remove the root privileges to launch the command (require change in
kernel space)
- show also the label of the filesystem (same as above)

Please review; as usual, comments are welcome.

You can pull the changes from the reposiotry
http://cassiopea.homelinux.net/git/btrfs-progs-unstable.git
branch 
filesystem-info

BR
G.Baroncelli

 Makefile  |8 +++---
 cmds-filesystem.c |   59 
 cmds-scrub.c  |   72 +-
 man/btrfs.8.in|8 ++
 utils.c   |   68 +++
 utils.h   |4 +++
 6 files changed, 145 insertions(+), 74 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] Add btrfs filesystem info command.

2012-06-08 Thread Goffredo Baroncelli
Add btrfs filesystem info command, which is capable to show some
btrfs filesystem information.
---
 cmds-filesystem.c |   59 +
 1 file changed, 59 insertions(+)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 1f53d1c..9acc715 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -519,6 +519,64 @@ static int cmd_label(int argc, char **argv)
return get_label(argv[1]);
 }
 
+static const char * const cmd_info_usage[] = {
+   btrfs filesystem info path,
+   Get filesystem information,
+   NULL
+};
+
+static int cmd_info(int argc, char **argv){
+
+   struct btrfs_ioctl_fs_info_args fi_args;
+   struct btrfs_ioctl_dev_info_args*di_ret=NULL;
+   int ret,i;
+   charuuidbuf[37];
+   int fd;
+
+   if( argc != 2){
+   fprintf(stderr, ERROR: missing argument\n);
+   return 102;
+   }
+
+   fd = open_file_or_dir(argv[1]);
+   if( fd  0 ){
+   fprintf(stderr, ERROR: cannot open '%s'\n, argv[1]);
+   return 103;
+   }
+
+   ret = get_fs_info(fd, argv[1], fi_args, di_ret);
+   if( ret ){
+   int e = errno;
+   fprintf(stderr, ERROR: cannot get information about '%s'\n,
+   argv[1]);
+   fprintf(stderr, ERROR: errno=%d, error=%s\n,
+   e, strerror(e) );
+   close(fd);
+   if(di_ret != NULL )
+   free(di_ret);
+   return 104;
+   }
+
+   printf(Path: %s\n,argv[1]);
+   printf(Max ID: %llu\n, fi_args.max_id);
+   uuid_unparse(fi_args.fsid, uuidbuf);
+   printf(UUID: %s\n, uuidbuf);
+
+   printf(\nNum devices: %llu\n, fi_args.num_devices);
+   for( i = 0 ; i  fi_args.num_devices ; i++ ){
+   printf(  Dev ID: %llu\n, di_ret[i].devid);
+   uuid_unparse(di_ret[i].uuid, uuidbuf);
+   printf(  UUID: %s\n, uuidbuf);
+   printf(  Bytes used: %llu\n, di_ret[i].bytes_used );
+   printf(  Total bytes: %llu\n, di_ret[i].total_bytes );
+   printf(  Path: %s\n, di_ret[i].path);
+   }
+
+   close(fd);
+   free(di_ret);
+   return 0;
+}
+
 const struct cmd_group filesystem_cmd_group = {
filesystem_cmd_group_usage, NULL, {
{ df, cmd_df, cmd_df_usage, NULL, 0 },
@@ -528,6 +586,7 @@ const struct cmd_group filesystem_cmd_group = {
{ balance, cmd_balance, NULL, balance_cmd_group, 1 },
{ resize, cmd_resize, cmd_resize_usage, NULL, 0 },
{ label, cmd_label, cmd_label_usage, NULL, 0 },
+   { info, cmd_info, cmd_info_usage, NULL, 0 },
{ 0, 0, 0, 0, 0 },
}
 };
-- 
1.7.10

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Btrfs-progs: make two utility functions globally available

2012-06-08 Thread Goffredo Baroncelli
This patch is ispired by a Stefan Behrens one.
---
 Makefile |8 +++
 cmds-scrub.c |   72 ++
 utils.c  |   68 ++
 utils.h  |4 
 4 files changed, 78 insertions(+), 74 deletions(-)

diff --git a/Makefile b/Makefile
index 79818e6..0915021 100644
--- a/Makefile
+++ b/Makefile
@@ -4,7 +4,7 @@ CFLAGS = -g -O0
 objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \
  root-tree.o dir-item.o file-item.o inode-item.o \
  inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \
- volumes.o utils.o btrfs-list.o btrfslabel.o repair.o
+ volumes.o utils.o btrfs-list.o btrfslabel.o repair.o common.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
   cmds-inspect.o cmds-balance.o
 
@@ -39,8 +39,8 @@ all: version $(progs) manpages
 version:
bash version.sh
 
-btrfs: $(objects) btrfs.o help.o common.o $(cmds_objects)
-   $(CC) $(CFLAGS) -o btrfs btrfs.o help.o common.o $(cmds_objects) \
+btrfs: $(objects) btrfs.o help.o $(cmds_objects)
+   $(CC) $(CFLAGS) -o btrfs btrfs.o help.o $(cmds_objects) \
$(objects) $(LDFLAGS) $(LIBS) -lpthread
 
 calc-size: $(objects) calc-size.o
@@ -52,7 +52,7 @@ btrfs-find-root: $(objects) find-root.o
 btrfs-restore: $(objects) restore.o
gcc $(CFLAGS) -o btrfs-restore restore.o $(objects) $(LDFLAGS) $(LIBS) 
$(RESTORE_LIBS)
 
-btrfsctl: $(objects) btrfsctl.o
+btrfsctl: $(objects) btrfsctl.o 
$(CC) $(CFLAGS) -o btrfsctl btrfsctl.o $(objects) $(LDFLAGS) $(LIBS)
 
 btrfs-vol: $(objects) btrfs-vol.o
diff --git a/cmds-scrub.c b/cmds-scrub.c
index c4503f4..37a9890 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -967,74 +967,6 @@ static struct scrub_file_record *last_dev_scrub(
return NULL;
 }
 
-static int scrub_device_info(int fd, u64 devid,
-struct btrfs_ioctl_dev_info_args *di_args)
-{
-   int ret;
-
-   di_args-devid = devid;
-   memset(di_args-uuid, '\0', sizeof(di_args-uuid));
-
-   ret = ioctl(fd, BTRFS_IOC_DEV_INFO, di_args);
-   return ret ? -errno : 0;
-}
-
-static int scrub_fs_info(int fd, char *path,
-   struct btrfs_ioctl_fs_info_args *fi_args,
-   struct btrfs_ioctl_dev_info_args **di_ret)
-{
-   int ret = 0;
-   int ndevs = 0;
-   int i = 1;
-   struct btrfs_fs_devices *fs_devices_mnt = NULL;
-   struct btrfs_ioctl_dev_info_args *di_args;
-   char mp[BTRFS_PATH_NAME_MAX + 1];
-
-   memset(fi_args, 0, sizeof(*fi_args));
-
-   ret = ioctl(fd, BTRFS_IOC_FS_INFO, fi_args);
-   if (ret  errno == EINVAL) {
-   /* path is no mounted btrfs. try if it's a device */
-   ret = check_mounted_where(fd, path, mp, sizeof(mp),
-   fs_devices_mnt);
-   if (!ret)
-   return -EINVAL;
-   if (ret  0)
-   return ret;
-   fi_args-num_devices = 1;
-   fi_args-max_id = fs_devices_mnt-latest_devid;
-   i = fs_devices_mnt-latest_devid;
-   memcpy(fi_args-fsid, fs_devices_mnt-fsid, BTRFS_FSID_SIZE);
-   close(fd);
-   fd = open_file_or_dir(mp);
-   if (fd  0)
-   return -errno;
-   } else if (ret) {
-   return -errno;
-   }
-
-   if (!fi_args-num_devices)
-   return 0;
-
-   di_args = *di_ret = malloc(fi_args-num_devices * sizeof(*di_args));
-   if (!di_args)
-   return -errno;
-
-   for (; i = fi_args-max_id; ++i) {
-   BUG_ON(ndevs = fi_args-num_devices);
-   ret = scrub_device_info(fd, i, di_args[ndevs]);
-   if (ret == -ENODEV)
-   continue;
-   if (ret)
-   return ret;
-   ++ndevs;
-   }
-
-   BUG_ON(ndevs == 0);
-
-   return 0;
-}
-
 int mkdir_p(char *path)
 {
int i;
@@ -1155,7 +1087,7 @@ static int scrub_start(int argc, char **argv, int resume)
return 12;
}
 
-   ret = scrub_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(fdmnt, path, fi_args, di_args);
if (ret) {
ERR(!do_quiet, ERROR: getting dev info for scrub failed: 
%s\n, strerror(-ret));
@@ -1621,7 +1553,7 @@ static int cmd_scrub_status(int argc, char **argv)
return 12;
}
 
-   ret = scrub_fs_info(fdmnt, path, fi_args, di_args);
+   ret = get_fs_info(fdmnt, path, fi_args, di_args);
if (ret) {
fprintf(stderr, ERROR: getting dev info for scrub failed: 
%s\n, strerror(-ret));
diff --git a/utils.c b/utils.c
index ee7fa1b..d916912 100644
--- a/utils.c
+++ 

[PATCH 3/3] Add btrfs filesystem info man page.

2012-06-08 Thread Goffredo Baroncelli
Update the man page to document the btrfs filesystem info path
command.
---
 man/btrfs.8.in |8 
 1 file changed, 8 insertions(+)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index be478e0..6d96bf7 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -25,6 +25,8 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBfilesystem defrag\fP\fI [options] file|dir 
[file|dir...]\fP
 .PP
+\fBbtrfs\fP \fBfilesystem info\fP\fI path\fP
+.PP
 \fBbtrfs\fP \fBsubvolume find-new\fP\fI subvolume last_gen\fP
 .PP
 \fBbtrfs\fP \fBfilesystem balance\fP\fI path \fP
@@ -154,6 +156,12 @@ The start position and the number of bytes to deframention 
can be specified by \
 NOTE: defragmenting with kernels up to 2.6.37 will unlink COW-ed copies of 
data, don't 
 use it if you use snapshots, have de-duplicated your data or made copies with 
 \fBcp --reflink\fP.
+.TP
+
+\fBfilesystem info\fP \fIpath\fR
+Get filesystem info.
+.TP
+
 \fBsubvolume find-new\fR\fI subvolume last_gen\fR
 List the recently modified files in a subvolume, after \fIlast_gen\fR ID.
 .TP
-- 
1.7.10

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Add btrfs filesystem info man page.

2012-06-08 Thread Jérôme Poulin
What kind of info does it give? This is really unhelpful in a man
page, it just tells the obvious.

On Fri, Jun 8, 2012 at 3:12 PM, Goffredo Baroncelli kreij...@inwind.it wrote:

 Update the man page to document the btrfs filesystem info path
 command.
 ---
  man/btrfs.8.in |    8 
  1 file changed, 8 insertions(+)

 diff --git a/man/btrfs.8.in b/man/btrfs.8.in
 index be478e0..6d96bf7 100644
 --- a/man/btrfs.8.in
 +++ b/man/btrfs.8.in
 @@ -25,6 +25,8 @@ btrfs \- control a btrfs filesystem
  .PP
  \fBbtrfs\fP \fBfilesystem defrag\fP\fI [options] file|dir 
 [file|dir...]\fP
  .PP
 +\fBbtrfs\fP \fBfilesystem info\fP\fI path\fP
 +.PP
  \fBbtrfs\fP \fBsubvolume find-new\fP\fI subvolume last_gen\fP
  .PP
  \fBbtrfs\fP \fBfilesystem balance\fP\fI path \fP
 @@ -154,6 +156,12 @@ The start position and the number of bytes to 
 deframention can be specified by \
  NOTE: defragmenting with kernels up to 2.6.37 will unlink COW-ed copies of 
 data, don't
  use it if you use snapshots, have de-duplicated your data or made copies with
  \fBcp --reflink\fP.
 +.TP
 +
 +\fBfilesystem info\fP \fIpath\fR
 +Get filesystem info.
 +.TP
 +
  \fBsubvolume find-new\fR\fI subvolume last_gen\fR
  List the recently modified files in a subvolume, after \fIlast_gen\fR ID.
  .TP
 --
 1.7.10

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: keep inode pinned when compressing writes

2012-06-08 Thread Josef Bacik
A user reported lots of problems using compression on the new code and it
turns out part of the problem was that igrab() was failing when we added a
new ordered extent.  This is because when writing out an inode under
compression we immediately return without actually doing anything to the
pages, and then in another thread at some point down the line actually do
the ordered dance.  The problem is between the point that we start writeback
and we actually add the ordered extent we could be trying to reclaim the
inode, which makes igrab() return NULL.  So we need to do an igrab() when we
create the async extent and then drop it when we are done with it.  This
makes sure we stay pinned in memory until the ordered extent can get a
reference on it and we are good to go.  With this patch we no longer panic
in btrfs_finish_ordered_io().  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/inode.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 51b9971..293c018 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -986,8 +986,10 @@ static noinline void async_cow_start(struct btrfs_work 
*work)
compress_file_range(async_cow-inode, async_cow-locked_page,
async_cow-start, async_cow-end, async_cow,
num_added);
-   if (num_added == 0)
+   if (num_added == 0) {
+   iput(async_cow-inode);
async_cow-inode = NULL;
+   }
 }
 
 /*
@@ -1020,6 +1022,8 @@ static noinline void async_cow_free(struct btrfs_work 
*work)
 {
struct async_cow *async_cow;
async_cow = container_of(work, struct async_cow, work);
+   if (async_cow-inode)
+   iput(async_cow-inode);
kfree(async_cow);
 }
 
@@ -1038,7 +1042,7 @@ static int cow_file_range_async(struct inode *inode, 
struct page *locked_page,
while (start  end) {
async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
BUG_ON(!async_cow); /* -ENOMEM */
-   async_cow-inode = inode;
+   async_cow-inode = igrab(inode);
async_cow-root = root;
async_cow-locked_page = locked_page;
async_cow-start = start;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: call filemap_fdatawrite twice for compression

2012-06-08 Thread Josef Bacik
I removed this in an earlier commit and I was wrong.  Because compression
can return from filemap_fdatawrite() without having actually set any of it's
pages as writeback() it can make filemap_fdatawait() do essentially nothing,
and then we won't find any ordered extents because they may not have been
created yet.  So not only does this make fsync() completely useless, but it
will also screw up if you truncate on a non-page aligned offset since we
zero out the end and then wait on ordered extents and then call drop caches.
We can drop the cache before the io completes and then we try to unpin the
extent we just wrote we won't find it and everything goes sideways.  So fix
this by putting it back and put a giant comment there to keep me from trying
to remove it in the future.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/ordered-data.c |   24 +++-
 1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 9e138cd..49cacdb 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -611,6 +611,7 @@ void btrfs_start_ordered_extent(struct inode *inode,
  */
 void btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len)
 {
+   struct btrfs_root *root = BTRFS_I(inode)-root;
u64 end;
u64 orig_end;
struct btrfs_ordered_extent *ordered;
@@ -627,7 +628,28 @@ void btrfs_wait_ordered_range(struct inode *inode, u64 
start, u64 len)
/* start IO across the range first to instantiate any delalloc
 * extents
 */
-   filemap_write_and_wait_range(inode-i_mapping, start, orig_end);
+   filemap_fdatawrite_range(inode-i_mapping, start, orig_end);
+
+   /*
+* So with compression we will find and lock a dirty page and clear the
+* first one as dirty, setup an async extent, and immediately return
+* with the entire range locked but with nobody actually marked with
+* writeback.  So we can't just filemap_write_and_wait_range() and
+* expect it to work since it will just kick off a thread to do the
+* actual work.  So we need to call filemap_fdatawrite_range _again_
+* since it will wait on the page lock, which won't be unlocked until
+* after the pages have been marked as writeback and so we're good to go
+* from there.  We have to do this otherwise we'll miss the ordered
+* extents and that results in badness.  Please Josef, do not think you
+* know better and pull this out at some point in the future, it is
+* right and you are wrong.
+*/
+   if (btrfs_test_opt(root, COMPRESS) ||
+   (BTRFS_I(inode)-force_compress) ||
+   (BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))
+   filemap_fdatawrite_range(inode-i_mapping, start, orig_end);
+
+   filemap_fdatawait_range(inode-i_mapping, start, orig_end);
 
end = orig_end;
found = 0;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moving top level to a subvolume

2012-06-08 Thread Arne Jansen
On 06/08/2012 09:24 PM, Matthew Hawn wrote:
 I just converted my root filesystem to btrfs with btrfs-convert.  However, 
 since I am running Ubuntu, I would like to have the same subvolume structure 
 as a default install,. How do I move the top-level subvolume (where all my 
 files currently are) to another subvolume?
 

Just snapshot the root subvol and continue working in the snapshot.


 Matt
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Massive metadata size increase after upgrade from 3.2.18 to 3.4.1

2012-06-08 Thread Roman Mamedov
Hello,

Before the upgrade (on 3.2.18):

Metadata, DUP: total=9.38GB, used=5.94GB

After the FS has been mounted once with 3.4.1:

Data: total=3.44TB, used=2.67TB
System, DUP: total=8.00MB, used=412.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=84.38GB, used=5.94GB

Where did my 75 GB of free space just went?

-- 
With respect,
Roman

~~~
Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free.


signature.asc
Description: PGP signature


[RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Kirill A. Shutemov
From: Kirill A. Shutemov kirill.shute...@linux.intel.com

Sorry for resend. Original mail had too long cc list.


There's no reason to call rcu_barrier() on every deactivate_locked_super().
We only need to make sure that all delayed rcu free inodes are flushed
before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some
fas paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
---
 fs/9p/v9fs.c |5 +
 fs/adfs/super.c  |5 +
 fs/affs/super.c  |5 +
 fs/afs/super.c   |5 +
 fs/befs/linuxvfs.c   |5 +
 fs/bfs/inode.c   |5 +
 fs/btrfs/extent_io.c |6 ++
 fs/btrfs/inode.c |5 +
 fs/ceph/super.c  |5 +
 fs/cifs/cifsfs.c |5 +
 fs/coda/inode.c  |5 +
 fs/ecryptfs/main.c   |6 ++
 fs/efs/super.c   |5 +
 fs/exofs/super.c |5 +
 fs/ext2/super.c  |5 +
 fs/ext3/super.c  |5 +
 fs/ext4/super.c  |5 +
 fs/fat/inode.c   |5 +
 fs/freevxfs/vxfs_super.c |5 +
 fs/fuse/inode.c  |6 ++
 fs/hfs/super.c   |6 ++
 fs/hfsplus/super.c   |6 ++
 fs/hpfs/super.c  |5 +
 fs/hugetlbfs/inode.c |5 +
 fs/isofs/inode.c |5 +
 fs/jffs2/super.c |6 ++
 fs/jfs/super.c   |6 ++
 fs/logfs/inode.c |5 +
 fs/minix/inode.c |5 +
 fs/ncpfs/inode.c |5 +
 fs/nfs/inode.c   |5 +
 fs/nilfs2/super.c|6 ++
 fs/ntfs/super.c  |6 ++
 fs/ocfs2/dlmfs/dlmfs.c   |5 +
 fs/ocfs2/super.c |5 +
 fs/openpromfs/inode.c|5 +
 fs/qnx4/inode.c  |5 +
 fs/qnx6/inode.c  |5 +
 fs/reiserfs/super.c  |5 +
 fs/romfs/super.c |5 +
 fs/squashfs/super.c  |5 +
 fs/super.c   |6 --
 fs/sysv/inode.c  |5 +
 fs/ubifs/super.c |6 ++
 fs/udf/super.c   |5 +
 fs/ufs/super.c   |5 +
 fs/xfs/xfs_super.c   |5 +
 47 files changed, 240 insertions(+), 6 deletions(-)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index b85efa7..392c5da 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -560,6 +560,11 @@ static int v9fs_init_inode_cache(void)
  */
 static void v9fs_destroy_inode_cache(void)
 {
+   /*
+* Make sure all delayed rcu free inodes are flushed before we
+* destroy cache.
+*/
+   rcu_barrier();
kmem_cache_destroy(v9fs_inode_cache);
 }
 
diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index 06fdcc9..f137165 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -276,6 +276,11 @@ static int init_inodecache(void)
 
 static void destroy_inodecache(void)
 {
+   /*
+* Make sure all delayed rcu free inodes are flushed before we
+* destroy cache.
+*/
+   rcu_barrier();
kmem_cache_destroy(adfs_inode_cachep);
 }
 
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 0782653..4fe18a8 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -129,6 +129,11 @@ static int init_inodecache(void)
 
 static void destroy_inodecache(void)
 {
+   /*
+* Make sure all delayed rcu free inodes are flushed before we
+* destroy cache.
+*/
+   rcu_barrier();
kmem_cache_destroy(affs_inode_cachep);
 }
 
diff --git a/fs/afs/super.c b/fs/afs/super.c
index f02b31e..af69050 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -123,6 +123,11 @@ void __exit afs_fs_exit(void)
BUG();
}
 
+   /*
+* Make sure all delayed rcu free inodes are flushed before we
+* destroy cache.
+*/
+   rcu_barrier();
kmem_cache_destroy(afs_inode_cachep);
_leave();
 }
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index e18da23..02f3331 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -454,6 +454,11 @@ befs_init_inodecache(void)
 static void
 befs_destroy_inodecache(void)
 {
+   /*
+* Make sure all delayed rcu free inodes are flushed before we
+* destroy cache.
+*/
+   rcu_barrier();
kmem_cache_destroy(befs_inode_cachep);
 }
 
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 9870417..d5fc598 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -280,6 +280,11 @@ static int init_inodecache(void)
 
 static void destroy_inodecache(void)
 {
+   /*
+* Make sure all delayed rcu free inodes are flushed before we
+* destroy cache.
+*/
+   rcu_barrier();
kmem_cache_destroy(bfs_inode_cachep);
 }
 
diff --git a/fs/btrfs/extent_io.c 

Re: [RFC, PATCH] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Kirill A. Shutemov
On Fri, Jun 08, 2012 at 02:43:58PM -0700, Linus Torvalds wrote:
 On Fri, Jun 8, 2012 at 2:28 PM, Kirill A. Shutemov
 kirill.shute...@linux.intel.com wrote:
  From: Kirill A. Shutemov kirill.shute...@linux.intel.com
 
  There's no reason to call rcu_barrier() on every deactivate_locked_super().
  We only need to make sure that all delayed rcu free inodes are flushed
  before we destroy related cache.
 
  Removing rcu_barrier() from deactivate_locked_super() affects some
  fas paths. E.g. on my machine exit_group() of a last process in IPC
  namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
 
 I think we should just delete it.
 
 kmem_cache_destroy() (at least for SLUB) already has:
 
 if (s-flags  SLAB_DESTROY_BY_RCU)
 rcu_barrier();
 
 in it. But I think it's too late - it gets called *after* we do
 kmem_cache_close(), and I get the feeling that we should do it before.
 
 Shouldn't that be sufficient? And if other slab allocators don't have
 this, we should add it to them too.
 
 Hmm?

When I tried SLAB_DESTROY_BY_RCU I've got problem:

[   36.687999] Pid: 3455, comm: rmmod Not tainted 
3.5.0-rc1-00130-g48d212a-dirty #40
[   36.688001] Call Trace:
[   36.688012]  [8113367a] slab_err+0xaa/0xd0
[   36.688020]  [8113515a] ? __kmalloc+0x10a/0x110
[   36.688026]  [8113647d] kmem_cache_destroy+0x1dd/0x420
[   36.688056]  [a00f0f25] btrfs_destroy_cachep+0x15/0x60 [btrfs]
[   36.688076]  [a013cac3] exit_btrfs_fs+0x9/0x3a [btrfs]
[   36.688083]  [810c324e] sys_delete_module+0x16e/0x2f0
[   36.688090]  [8128cf29] ? lockdep_sys_exit_thunk+0x35/0x67
[   36.688097]  [8161eba6] system_call_fastpath+0x1a/0x1f
[   36.688111] Pid: 3455, comm: rmmod Not tainted 
3.5.0-rc1-00130-g48d212a-dirty #40
[   36.688114] Call Trace:
[   36.688119]  [811365ee] kmem_cache_destroy+0x34e/0x420
[   36.688143]  [a00f0f25] btrfs_destroy_cachep+0x15/0x60 [btrfs]
[   36.688162]  [a013cac3] exit_btrfs_fs+0x9/0x3a [btrfs]
[   36.688168]  [810c324e] sys_delete_module+0x16e/0x2f0
[   36.688174]  [8128cf29] ? lockdep_sys_exit_thunk+0x35/0x67
[   36.688179]  [8161eba6] system_call_fastpath+0x1a/0x1f

IIUC, moving rcu_barrier() up should help, but I can't say that I fully
understand SLAB_DESTROY_BY_RCU semantics.

-- 
 Kirill A. Shutemov


signature.asc
Description: Digital signature


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Andrew Morton
On Sat,  9 Jun 2012 00:41:03 +0300
Kirill A. Shutemov kirill.shute...@linux.intel.com wrote:

 There's no reason to call rcu_barrier() on every deactivate_locked_super().
 We only need to make sure that all delayed rcu free inodes are flushed
 before we destroy related cache.
 
 Removing rcu_barrier() from deactivate_locked_super() affects some
 fas paths. E.g. on my machine exit_group() of a last process in IPC
 namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

What an unpleasant patch.  Is final-process-exiting-ipc-namespace a
sufficiently high-frequency operation to justify the change?

I don't really understand what's going on here.  Are you saying that
there is some filesystem against which we run deactivate_locked_super()
during exit_group(), and that this filesystem doesn't use rcu-freeing
of inodes?  The description needs this level of detail, please.


The implementation would be less unpleasant if we could do the
rcu_barrier() in kmem_cache_destroy().  I can't see a way of doing that
without adding a dedicated slab flag, which would require editing all
the filesystems anyway.


(kmem_cache_destroy() already has an rcu_barrier().  Can we do away
with the private rcu games in the vfs and switch to
SLAB_DESTROY_BY_RCU?)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Linus Torvalds
On Fri, Jun 8, 2012 at 3:00 PM, Kirill A. Shutemov
kirill.shute...@linux.intel.com wrote:

 IIUC, moving rcu_barrier() up should help, but I can't say that I fully
 understand SLAB_DESTROY_BY_RCU semantics.

.. hmm. I think you may be right. Even if we do move it up, we
probably shouldn't use it.

We don't even want SLAB_DESTROY_BY_RCU, since we do the delayed RCU
free for other reasons anyway, so it would duplicate the RCU delaying
and cause problems. I forgot about that little complication.

We could have a separate RCU_BARRIER_ON_DESTROY thing, but that's
just silly too.

Maybe your patch is the right thing.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Kirill A. Shutemov
On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote:
 On Sat,  9 Jun 2012 00:41:03 +0300
 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote:
 
  There's no reason to call rcu_barrier() on every deactivate_locked_super().
  We only need to make sure that all delayed rcu free inodes are flushed
  before we destroy related cache.
  
  Removing rcu_barrier() from deactivate_locked_super() affects some
  fas paths. E.g. on my machine exit_group() of a last process in IPC
  namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
 
 What an unpleasant patch.  Is final-process-exiting-ipc-namespace a
 sufficiently high-frequency operation to justify the change?
 
 I don't really understand what's going on here.  Are you saying that
 there is some filesystem against which we run deactivate_locked_super()
 during exit_group(), and that this filesystem doesn't use rcu-freeing
 of inodes?  The description needs this level of detail, please.

I think the rcu_barrier() is in wrong place. We need it to safely destroy
inode cache. deactivate_locked_super() is part of umount() path, but all
filesystems I've checked have inode cache for whole filesystem, not
per-mount.

 The implementation would be less unpleasant if we could do the
 rcu_barrier() in kmem_cache_destroy().  I can't see a way of doing that
 without adding a dedicated slab flag, which would require editing all
 the filesystems anyway.

I think rcu_barrier() for all kmem_cache_destroy() would be too expensive.
 
 
 (kmem_cache_destroy() already has an rcu_barrier().  Can we do away
 with the private rcu games in the vfs and switch to
 SLAB_DESTROY_BY_RCU?)

-- 
 Kirill A. Shutemov


signature.asc
Description: Digital signature


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Al Viro
On Sat, Jun 09, 2012 at 01:14:46AM +0300, Kirill A. Shutemov wrote:
  The implementation would be less unpleasant if we could do the
  rcu_barrier() in kmem_cache_destroy().  I can't see a way of doing that
  without adding a dedicated slab flag, which would require editing all
  the filesystems anyway.
 
 I think rcu_barrier() for all kmem_cache_destroy() would be too expensive.

You've got to be kidding.  Please, show us the codepath that would be hot
enough to make that too expensive and would contain kmem_cache_destroy().
Note that module unload is *not* a hot path - not on any even remotely sane
use.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Al Viro
On Fri, Jun 08, 2012 at 03:06:20PM -0700, Linus Torvalds wrote:
 .. hmm. I think you may be right. Even if we do move it up, we
 probably shouldn't use it.
 
 We don't even want SLAB_DESTROY_BY_RCU, since we do the delayed RCU
 free for other reasons anyway, so it would duplicate the RCU delaying
 and cause problems. I forgot about that little complication.
 
 We could have a separate RCU_BARRIER_ON_DESTROY thing, but that's
 just silly too.

Why not make that rcu_barrier() in there unconditional?  Where are
we creating/destroying caches often enough for that to become a problem?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Andrew Morton
On Sat, 9 Jun 2012 01:14:46 +0300
Kirill A. Shutemov kirill.shute...@linux.intel.com wrote:

 On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote:
  On Sat,  9 Jun 2012 00:41:03 +0300
  Kirill A. Shutemov kirill.shute...@linux.intel.com wrote:
  
   There's no reason to call rcu_barrier() on every 
   deactivate_locked_super().
   We only need to make sure that all delayed rcu free inodes are flushed
   before we destroy related cache.
   
   Removing rcu_barrier() from deactivate_locked_super() affects some
   fas paths. E.g. on my machine exit_group() of a last process in IPC
   namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
  
  What an unpleasant patch.  Is final-process-exiting-ipc-namespace a
  sufficiently high-frequency operation to justify the change?

This, please.

  I don't really understand what's going on here.  Are you saying that
  there is some filesystem against which we run deactivate_locked_super()
  during exit_group(), and that this filesystem doesn't use rcu-freeing
  of inodes?  The description needs this level of detail, please.

You still haven't explained where this deactivate_locked_super() call
is coming from.  Oh well.

 I think the rcu_barrier() is in wrong place. We need it to safely destroy
 inode cache. deactivate_locked_super() is part of umount() path, but all
 filesystems I've checked have inode cache for whole filesystem, not
 per-mount.

Well from a design perspective, putting the rcu_barrier() in the vfs is
the *correct* place.  Individual filesystems shouldn't be hard-coding
knowledge about vfs internal machinery.

A neater implementation might be to add a kmem_cache* argument to
unregister_filesystem().  If that is non-NULL, unregister_filesystem()
does the rcu_barrier() and destroys the cache.  That way we get to
delete (rather than add) a bunch of code from all filesystems and new
and out-of-tree filesystems cannot forget to perform the rcu_barrier().

  The implementation would be less unpleasant if we could do the
  rcu_barrier() in kmem_cache_destroy().  I can't see a way of doing that
  without adding a dedicated slab flag, which would require editing all
  the filesystems anyway.
 
 I think rcu_barrier() for all kmem_cache_destroy() would be too expensive.

That is not what I proposed.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Al Viro
On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:

 A neater implementation might be to add a kmem_cache* argument to
 unregister_filesystem().  If that is non-NULL, unregister_filesystem()
 does the rcu_barrier() and destroys the cache.  That way we get to
 delete (rather than add) a bunch of code from all filesystems and new
 and out-of-tree filesystems cannot forget to perform the rcu_barrier().

There's often enough more than one cache, so that one is no-go.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Linus Torvalds
On Fri, Jun 8, 2012 at 3:23 PM, Al Viro v...@zeniv.linux.org.uk wrote:

 Note that module unload is *not* a hot path - not on any even remotely sane
 use.

Actually, I think we've had distributions that basically did a load
pretty much everything, and let God sort it out approach to modules.
I know some people *have* actually worried about module load/unload
performance.

Whether it is remotely sane I'm not going to argue for, but ..

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Kirill A. Shutemov
On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
 On Sat, 9 Jun 2012 01:14:46 +0300
 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote:
 
  On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote:
   On Sat,  9 Jun 2012 00:41:03 +0300
   Kirill A. Shutemov kirill.shute...@linux.intel.com wrote:
   
There's no reason to call rcu_barrier() on every 
deactivate_locked_super().
We only need to make sure that all delayed rcu free inodes are flushed
before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some
fas paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
   
   What an unpleasant patch.  Is final-process-exiting-ipc-namespace a
   sufficiently high-frequency operation to justify the change?
 
 This, please.

ALT Linux guys use a namespaces (including IPC namespace) to create
sandbox[1] for build system and other use cases. The build system calls
the sandboxing wrapper frequently on setup building chroot and build
prepare. This kind of delays affect timings significantly.

[1] http://git.altlinux.org/people/ldv/packages/hasher-priv.git

   I don't really understand what's going on here.  Are you saying that
   there is some filesystem against which we run deactivate_locked_super()
   during exit_group(), and that this filesystem doesn't use rcu-freeing
   of inodes?  The description needs this level of detail, please.
 
 You still haven't explained where this deactivate_locked_super() call
 is coming from.  Oh well.

Call Trace:
 [81443a2a] schedule+0x3a/0x50
 [81441e7d] schedule_timeout+0x1cd/0x2c0
 [811f8f87] ? mqueue_destroy_inode+0x17/0x20
 [81443044] wait_for_common+0xc4/0x160
 [8107af50] ? try_to_wake_up+0x2a0/0x2a0
 [810d63b0] ? call_rcu_sched+0x10/0x20
 [810d63a0] ? call_rcu_bh+0x20/0x20
 [81443188] wait_for_completion+0x18/0x20
 [810d5a9b] _rcu_barrier.clone.31+0x9b/0xb0
 [810d5ac0] rcu_barrier_sched+0x10/0x20
 [810d5ad9] rcu_barrier+0x9/0x10
 [811602c9] deactivate_locked_super+0x49/0x90
 [81160d35] deactivate_super+0x45/0x60
 [8117ad74] mntput_no_expire+0x104/0x150
 [8117addc] mntput+0x1c/0x30
 [8117cda7] kern_unmount+0x27/0x30
 [811faeb0] mq_put_mnt+0x10/0x20
 [811fb57f] put_ipc_ns+0x3f/0xb0
 [81071f5c] free_nsproxy+0x3c/0xa0
 [81072143] switch_task_namespaces+0x33/0x40
 [8107215b] exit_task_namespaces+0xb/0x10
 [8104f154] do_exit+0x4b4/0x8a0
 [8104f7e3] do_group_exit+0x53/0xc0
 [8104f862] sys_exit_group+0x12/0x20
 [8144c939] system_call_fastpath+0x16/0x1b

-- 
 Kirill A. Shutemov


signature.asc
Description: Digital signature


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Andrew Morton
On Sat, 9 Jun 2012 02:31:27 +0300
Kirill A. Shutemov kirill.shute...@linux.intel.com wrote:

 On Fri, Jun 08, 2012 at 03:31:20PM -0700, Andrew Morton wrote:
  On Fri, 8 Jun 2012 23:27:34 +0100
  Al Viro v...@zeniv.linux.org.uk wrote:
  
   On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
   
A neater implementation might be to add a kmem_cache* argument to
unregister_filesystem().  If that is non-NULL, unregister_filesystem()
does the rcu_barrier() and destroys the cache.  That way we get to
delete (rather than add) a bunch of code from all filesystems and new
and out-of-tree filesystems cannot forget to perform the rcu_barrier().
   
   There's often enough more than one cache, so that one is no-go.
  
  kmem_cache** ;)
  
  Which filesystems have multiple inode caches?
 
 Multiple inode caches? No.
 Multiple caches with call_rcu() free? See btrfs or gfs2.

OK.  But for those non-inode caches, the rcu treatment is private to
the filesystem.  Hence it is appropriate that the filesystem call
rcu_barrier() for those caches.

But in the case of the inode caches, the rcu treatment is a vfs thing,
so it is the vfs which should perform the rcu_barrier().

This is a red herring - those non-inode caches have nothing to do with
the issue we're dicussing.


So how about open-coding the rcu_barrier() in btrfs and gfs2 for the
non-inode caches (which is the appropriate place), and hand the inode
cache over to the vfs for treatment (which is the appropriate place).

The downside is that btrfs and gfs2 will do an extra rcu_barrier() at
umount time.  Shrug.

If they really want to super-optimise that, they can skip the private
rcu_barrier() call and assume that the vfs will be doing it.  Not a
good idea, IMO.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Linus Torvalds
On Fri, Jun 8, 2012 at 4:37 PM, Andrew Morton a...@linux-foundation.org wrote:

 So how about open-coding the rcu_barrier() in btrfs and gfs2 for the
 non-inode caches (which is the appropriate place), and hand the inode
 cache over to the vfs for treatment (which is the appropriate place).

The thing is, none of the inode caches are really up to the VFS. They
are per-filesystem caches, that just *embed* an inode as part of the
bigger ext4_inode or whatever. But apart from the fact that the
embedded inode requires them to then use the proper call_rcu() stuff
to do the delayed free, they really are pretty much filesystem data
structures. The VFS layer can neither free them or allocate them,
since the VFS layer doesn't even know how big the structures are, or
where the inodes are embedded, or how to initialize them (or even when
to allocate them).

Of course, if you just mean having a VFS wrapper that does

static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep)
{
rcu_barrier();
kmem_cache_destroy(cachep);
}

then we could do that. Not much better than what Kirill's patch did,
but at least we could have that comment in just one single place.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems

2012-06-08 Thread Andrew Morton
On Fri, 8 Jun 2012 16:46:47 -0700 Linus Torvalds 
torva...@linux-foundation.org wrote:

 Of course, if you just mean having a VFS wrapper that does
 
 static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep)
 {
 rcu_barrier();
 kmem_cache_destroy(cachep);
 }
 
 then we could do that. Not much better than what Kirill's patch did,
 but at least we could have that comment in just one single place.

That's conceptually what I meant.  But it has the problem that new and
out-of-tree filesystems might forget to do it.  Which is why I suggest
adding a kmem_cache* argument to unregister_filesystem() for this.

It's a bit awkward, and the fs can pass in NULL if it knows what it's
doing.  But it's reliable.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problem re-adding original mount block device on multi-device fs

2012-06-08 Thread Goffredo Baroncelli
On 06/08/2012 06:42 AM, Lennert Buytenhek wrote:
 (please CC on replies, I'm not subscribed to the list)
 
 Hi!
 
 This fails:
 
   # mkfs.btrfs /dev/sd[bcde]
   # mount /dev/sdb /mnt/x
   # btrfs device delete /dev/sdb /mnt/x
   # btrfs device add /dev/sdb /mnt/x
   /dev/sdb is mounted
   #
 
 It seems that I have to unmount and remount the fs using another
 constituent block device before I can re-add the original block
 device that the filesystem was mounted with:
 
   # umount /mnt/x
   # mount /dev/sdb /mnt/x
   mount: wrong fs type, bad option, bad superblock on /dev/sdb,
  missing codepage or helper program, or other error
  In some cases useful info is found in syslog - try
  dmesg | tail or so
   # mount /dev/sdc /mnt/x
   # btrfs device add /dev/sdb /mnt/x
   #
 
 This is on up-to-date F17, kernel 3.4.0(-1.fc17) with btrfs-progs
 0.19(-18.fc17).

Give a look to this patche:

[PATCH 2/2] Btrfs: fix wrong the mount information in /proc

This should address (even without solving) this kind of problem.


 
 
 thanks,
 Lennert
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 .
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html