Re: [PATCH] btrfs: Use immediate assignment when referencing cc-option

2021-03-16 Thread Anand Jain

On 17/03/2021 06:46, Victor Erminpour wrote:

Calling cc-option will use KBUILD_CFLAGS, which when lazy setting
subdir-ccflags-y produces the following build error:

scripts/Makefile.lib:10: *** Recursive variable `KBUILD_CFLAGS' \
references itself (eventually).  Stop.

Use := assignment to subdir-ccflags-y when referencing cc-option.
This causes make to also evaluate += immediately, cc-option
calls are done right away and we don't end up with KBUILD_CFLAGS
referencing itself.



Thanks for the patch.
Reviewed-by: Anand Jain 


Signed-off-by: Victor Erminpour 
---
  fs/btrfs/Makefile | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index b634c42115ea..3dba1336fa95 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,10 +7,10 @@ subdir-ccflags-y += -Wmissing-format-attribute
  subdir-ccflags-y += -Wmissing-prototypes
  subdir-ccflags-y += -Wold-style-definition
  subdir-ccflags-y += -Wmissing-include-dirs
-subdir-ccflags-y += $(call cc-option, -Wunused-but-set-variable)
-subdir-ccflags-y += $(call cc-option, -Wunused-const-variable)
-subdir-ccflags-y += $(call cc-option, -Wpacked-not-aligned)
-subdir-ccflags-y += $(call cc-option, -Wstringop-truncation)




+subdir-ccflags-y := $(call cc-option, -Wunused-but-set-variable)
+subdir-ccflags-y := $(call cc-option, -Wunused-const-variable)
+subdir-ccflags-y := $(call cc-option, -Wpacked-not-aligned)
+subdir-ccflags-y := $(call cc-option, -Wstringop-truncation)






  # The following turn off the warnings enabled by -Wextra
  subdir-ccflags-y += -Wno-missing-field-initializers
  subdir-ccflags-y += -Wno-sign-compare





Re: [PATCH] btrfs: turn btrfs_destroy_delayed_refs() into void function

2021-03-09 Thread Anand Jain

On 9/3/21 5:32 pm, Yang Li wrote:

This function always return '0' and no callers use the return value.
So make it a void function.

This eliminates the following coccicheck warning:
./fs/btrfs/disk-io.c:4522:5-8: Unneeded variable: "ret". Return "0" on
line 4530

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 


Reviewed-by: Anand Jain 


---
  fs/btrfs/disk-io.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 41b718c..b75d2d9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -52,7 +52,7 @@
  
  static void end_workqueue_fn(struct btrfs_work *work);

  static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
-static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
+static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info);


 A nit...
 The declare here can be removed without moving the code.

Thanks, Anand


  static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root);
  static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
@@ -4513,13 +4513,12 @@ static void btrfs_destroy_all_ordered_extents(struct 
btrfs_fs_info *fs_info)
btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1);
  }
  
-static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,

+static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info)
  {
struct rb_node *node;
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_delayed_ref_node *ref;
-   int ret = 0;
  
  	delayed_refs = >delayed_refs;
  
@@ -4592,8 +4591,6 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,

btrfs_qgroup_destroy_extent_records(trans);
  
  	spin_unlock(_refs->lock);

-
-   return ret;
  }
  
  static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)






Re: [PATCH RFC 6/6] btrfs: Add roundrobin raid1 read policy

2021-02-10 Thread Anand Jain

On 10/02/2021 04:30, Michal Rostecki wrote:

From: Michal Rostecki 

Add a new raid1 read policy `roundrobin`. For each read request, it
selects the mirror which has lower load than queue depth and it starts
iterating from the last used mirror (by the current CPU). Load is
defined as the number of inflight requests + a potential penalty value.

The policy can be enabled through sysfs:

   # echo roundrobin > /sys/fs/btrfs/[fsid]/read_policies/policy

This policy was tested with fio and compared with the default `pid`
policy.

The singlethreaded test has the following parameters:

   [global]
   name=btrfs-raid1-seqread
   filename=btrfs-raid1-seqread
   rw=read
   bs=64k
   direct=0
   numjobs=1
   time_based=0

   [file1]
   size=10G
   ioengine=libaio

and shows the following results:

- raid1c3 with 3 HDDs:
   3 x Segate Barracuda ST2000DM008 (2TB)
   * pid policy
 READ: bw=217MiB/s (228MB/s), 217MiB/s-217MiB/s (228MB/s-228MB/s),
 io=10.0GiB (10.7GB), run=47082-47082msec
   * roundrobin policy
 READ: bw=409MiB/s (429MB/s), 409MiB/s-409MiB/s (429MB/s-429MB/s),
 io=10.0GiB (10.7GB), run=25028-25028mse
- raid1c3 with 2 HDDs and 1 SSD:
   2 x Segate Barracuda ST2000DM008 (2TB)
   1 x Crucial CT256M550SSD1 (256GB)
   * pid policy (the worst case when only HDDs were chosen)
 READ: bw=220MiB/s (231MB/s), 220MiB/s-220MiB/s (231MB/s-231MB/s),
 io=10.0GiB (10.7GB), run=46577-46577mse
   * pid policy (the best case when SSD was used as well)
 READ: bw=513MiB/s (538MB/s), 513MiB/s-513MiB/s (538MB/s-538MB/s),
 io=10.0GiB (10.7GB), run=19954-19954msec
   * roundrobin (there are no noticeable differences when testing multiple
 times)
 READ: bw=541MiB/s (567MB/s), 541MiB/s-541MiB/s (567MB/s-567MB/s),
 io=10.0GiB (10.7GB), run=18933-18933msec

The multithreaded test has the following parameters:

   [global]
   name=btrfs-raid1-seqread
   filename=btrfs-raid1-seqread
   rw=read
   bs=64k
   direct=0
   numjobs=8
   time_based=0

   [file1]
   size=10G
   ioengine=libaio

and shows the following results:

- raid1c3 with 3 HDDs: 3 x Segate Barracuda ST2000DM008 (2TB)
   3 x Segate Barracuda ST2000DM008 (2TB)
   * pid policy
 READ: bw=1569MiB/s (1645MB/s), 196MiB/s-196MiB/s (206MB/s-206MB/s),
 io=80.0GiB (85.9GB), run=52210-52211msec
   * roundrobin
 READ: bw=1733MiB/s (1817MB/s), 217MiB/s-217MiB/s (227MB/s-227MB/s),
 io=80.0GiB (85.9GB), run=47269-47271msec
- raid1c3 with 2 HDDs and 1 SSD:
   2 x Segate Barracuda ST2000DM008 (2TB)
   1 x Crucial CT256M550SSD1 (256GB)
   * pid policy
 READ: bw=1843MiB/s (1932MB/s), 230MiB/s-230MiB/s (242MB/s-242MB/s),
 io=80.0GiB (85.9GB), run=9-44450msec
   * roundrobin
 READ: bw=2485MiB/s (2605MB/s), 311MiB/s-311MiB/s (326MB/s-326MB/s),
 io=80.0GiB (85.9GB), run=32969-32970msec






The penalty value is an additional value added to the number of inflight
requests when a scheduled request is non-local (which means it would
start from the different physical location than the physical location of
the last request processed by the given device). By default, it's
applied only in filesystems which have mixed types of devices
(non-rotational and rotational), but it can be configured to be applied
without that condition.

The configuration is done through sysfs:
> - /sys/fs/btrfs/[fsid]/read_policies/roundrobin_nonlocal_inc_mixed_only

where 1 (the default) value means applying penalty only in mixed arrays,
0 means applying it unconditionally.

>

The exact penalty value is defined separately for non-rotational and
rotational devices. By default, it's 0 for non-rotational devices and 1
for rotational devices. Both values are configurable through sysfs:

- /sys/fs/btrfs/[fsid]/read_policies/roundrobin_nonrot_nonlocal_inc
- /sys/fs/btrfs/[fsid]/read_policies/roundrobin_rot_nonlocal_inc

To sum it up - the default case is applying the penalty under the
following conditions:

- the raid1 array consists of mixed types of devices
- the scheduled request is going to be non-local for the given disk
- the device is rotational

>

That default case is based on a slight preference towards non-rotational
disks in mixed arrays and has proven to give the best performance in
tested arrays.

>> For the array with 3 HDDs, not adding any penalty resulted in 409MiB/s

(429MB/s) performance. Adding the penalty value 1 resulted in a
performance drop to 404MiB/s (424MB/s). Increasing the value towards 10
was making the performance even worse.

For the array with 2 HDDs and 1 SSD, adding penalty value 1 to
rotational disks resulted in the best performance - 541MiB/s (567MB/s).
Not adding any value and increasing the value was making the performance
worse.
> Adding penalty value to non-rotational disks was always decreasing the
performance, which motivated setting it as 0 by default. For the purpose
of testing, it's still configurable.

>

To measure the performance of each policy and find optimal penalty
values, I created scripts which 

Re: [PATCH RFC 0/6] Add roundrobin raid1 read policy

2021-02-09 Thread Anand Jain

On 10/02/2021 04:30, Michal Rostecki wrote:

From: Michal Rostecki 

This patch series adds a new raid1 read policy - roundrobin. For each
request, it selects the mirror which has lower load than queue depth.
Load is defined  as the number of inflight requests + a penalty value
(if the scheduled request is not local to the last processed request for
a rotational disk).

The series consists of preparational changes which add necessary
information to the btrfs_device struct and the change with the policy.

This policy was tested with fio and compared with the default `pid`
policy.

The singlethreaded test has the following parameters:

   [global]
   name=btrfs-raid1-seqread
   filename=btrfs-raid1-seqread
   rw=read
   bs=64k
   direct=0
   numjobs=1
   time_based=0

   [file1]
   size=10G
   ioengine=libaio

and shows the following results:

- raid1c3 with 3 HDDs:
   3 x Segate Barracuda ST2000DM008 (2TB)
   * pid policy
 READ: bw=217MiB/s (228MB/s), 217MiB/s-217MiB/s (228MB/s-228MB/s),
 io=10.0GiB (10.7GB), run=47082-47082msec
   * roundrobin policy
 READ: bw=409MiB/s (429MB/s), 409MiB/s-409MiB/s (429MB/s-429MB/s),
 io=10.0GiB (10.7GB), run=25028-25028mse




- raid1c3 with 2 HDDs and 1 SSD:
   2 x Segate Barracuda ST2000DM008 (2TB)
   1 x Crucial CT256M550SSD1 (256GB)
   * pid policy (the worst case when only HDDs were chosen)
 READ: bw=220MiB/s (231MB/s), 220MiB/s-220MiB/s (231MB/s-231MB/s),
 io=10.0GiB (10.7GB), run=46577-46577mse
   * pid policy (the best case when SSD was used as well)
 READ: bw=513MiB/s (538MB/s), 513MiB/s-513MiB/s (538MB/s-538MB/s),
 io=10.0GiB (10.7GB), run=19954-19954msec
   * roundrobin (there are no noticeable differences when testing multiple
 times)
 READ: bw=541MiB/s (567MB/s), 541MiB/s-541MiB/s (567MB/s-567MB/s),
 io=10.0GiB (10.7GB), run=18933-18933msec

The multithreaded test has the following parameters:

   [global]
   name=btrfs-raid1-seqread
   filename=btrfs-raid1-seqread
   rw=read
   bs=64k
   direct=0
   numjobs=8
   time_based=0

   [file1]
   size=10G
   ioengine=libaio

and shows the following results:

- raid1c3 with 3 HDDs: 3 x Segate Barracuda ST2000DM008 (2TB)
   3 x Segate Barracuda ST2000DM008 (2TB)
   * pid policy
 READ: bw=1569MiB/s (1645MB/s), 196MiB/s-196MiB/s (206MB/s-206MB/s),
 io=80.0GiB (85.9GB), run=52210-52211msec
   * roundrobin
 READ: bw=1733MiB/s (1817MB/s), 217MiB/s-217MiB/s (227MB/s-227MB/s),
 io=80.0GiB (85.9GB), run=47269-47271msec
- raid1c3 with 2 HDDs and 1 SSD:
   2 x Segate Barracuda ST2000DM008 (2TB)
   1 x Crucial CT256M550SSD1 (256GB)
   * pid policy
 READ: bw=1843MiB/s (1932MB/s), 230MiB/s-230MiB/s (242MB/s-242MB/s),
 io=80.0GiB (85.9GB), run=9-44450msec
   * roundrobin
 READ: bw=2485MiB/s (2605MB/s), 311MiB/s-311MiB/s (326MB/s-326MB/s),
 io=80.0GiB (85.9GB), run=32969-32970msec



 Both of the above test cases are sequential. How about some random IO
 workload?

 Also, the seek time for non-rotational devices does not exist. So it is
 a good idea to test with ssd + nvme and all nvme or all ssd.


To measure the performance of each policy and find optimal penalty
values, I created scripts which are available here:

https://gitlab.com/vadorovsky/btrfs-perf
https://github.com/mrostecki/btrfs-perf

Michal Rostecki (6):




   btrfs: Add inflight BIO request counter
   btrfs: Store the last device I/O offset


These patches look good. But as only round-robin policy requires
to monitor the inflight and last-offset. Could you bring them under
if policy=roundrobin? Otherwise, it is just a waste of CPU cycles
if the policy != roundrobin.

>btrfs: Add stripe_physical function
>btrfs: Check if the filesystem is has mixed type of devices

Thanks, Anand


   btrfs: sysfs: Add directory for read policies
   btrfs: Add roundrobin raid1 read policy

  fs/btrfs/ctree.h   |   3 +
  fs/btrfs/disk-io.c |   3 +
  fs/btrfs/sysfs.c   | 144 ++
  fs/btrfs/volumes.c | 218 +++--
  fs/btrfs/volumes.h |  22 +
  5 files changed, 366 insertions(+), 24 deletions(-)





Re: [PATCH] btrfs: remove redundant NULL check

2021-01-21 Thread Anand Jain

On 21/1/21 4:19 pm, Yang Li wrote:

Fix below warnings reported by coccicheck:
./fs/btrfs/raid56.c:237:2-8: WARNING: NULL check before some freeing
functions is not needed.

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
  fs/btrfs/raid56.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 93fbf87..5394641 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -233,8 +233,7 @@ int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info 
*info)
}
  
  	x = cmpxchg(>stripe_hash_table, NULL, table);

-   if (x)
-   kvfree(x);
+   kvfree(x);




 Reviewed-by: Anand Jain 

Thanks, Anand



return 0;
  }
  





Re: [PATCH v2] btrfs: return EAGAIN when receiving a pending signal in the defrag loops

2020-11-18 Thread Anand Jain

On 18/11/20 12:02 pm, xiakaixu1...@gmail.com wrote:

From: Kaixu Xia 

The variable ret is overwritten by the following variable defrag_count.
Actually the code should return EAGAIN when receiving a pending signal
in the defrag loops.

Reported-by: Tosk Robot 
Signed-off-by: Kaixu Xia 
---
v2
  -return EAGAIN instead of remove the EAGAIN error.


 Sorry I might have missed in v1. Why was EAGAIN needed here?
 Return of defrag_count rather makes sense to me as of now.

Thanks, Anand



  fs/btrfs/ioctl.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 69a384145dc6..6f13db6d30bd 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1519,7 +1519,7 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
if (btrfs_defrag_cancelled(fs_info)) {
btrfs_debug(fs_info, "defrag_file cancelled");
ret = -EAGAIN;
-   break;
+   goto out_ra;
}
  
  		if (!should_defrag_range(inode, (u64)i << PAGE_SHIFT,






Re: WARNING in close_fs_devices (2)

2020-10-30 Thread Anand Jain




On 30/10/20 6:10 pm, Dmitry Vyukov wrote:

On Tue, Sep 22, 2020 at 2:37 PM Anand Jain  wrote:


On 18/9/20 7:22 pm, syzbot wrote:

Hello,

syzbot found the following issue on:

HEAD commit:e4c26faa Merge tag 'usb-5.9-rc5' of git://git.kernel.org/p..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15bf162190
kernel config:  https://syzkaller.appspot.com/x/.config?x=c61610091f4ca8c4
dashboard link: https://syzkaller.appspot.com/bug?extid=4cfe71a4da060be47502
compiler:   gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+4cfe71a4da060be47...@syzkaller.appspotmail.com

[ cut here ]
WARNING: CPU: 1 PID: 3612 at fs/btrfs/volumes.c:1166 
close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 3612 Comm: syz-executor.2 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x198/0x1fd lib/dump_stack.c:118
   panic+0x347/0x7c0 kernel/panic.c:231
   __warn.cold+0x20/0x46 kernel/panic.c:600
   report_bug+0x1bd/0x210 lib/bug.c:198
   handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234
   exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254
   asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
Code: 0f b6 04 02 84 c0 74 02 7e 33 48 8b 44 24 18 c6 80 30 01 00 00 00 48 83 c4 30 
5b 5d 41 5c 41 5d 41 5e 41 5f c3 e8 99 ce 6a fe <0f> 0b e9 71 ff ff ff e8 8d ce 
6a fe 0f 0b e9 20 ff ff ff e8 d1 d5
RSP: 0018:c900091777e0 EFLAGS: 00010246
RAX: 0004 RBX:  RCX: c9000c8b7000
RDX: 0004 RSI: 83097f47 RDI: 0007
RBP: dc00 R08: 0001 R09: 8880988a187f
R10:  R11: 0001 R12: 88809593a130
R13: 88809593a1ec R14: 8880988a1908 R15: 88809593a050
   close_fs_devices fs/btrfs/volumes.c:1193 [inline]
   btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
   open_ctree+0x4984/0x4a2d fs/btrfs/disk-io.c:3434
   btrfs_fill_super fs/btrfs/super.c:1316 [inline]
   btrfs_mount_root.cold+0x14/0x165 fs/btrfs/super.c:1672
   legacy_get_tree+0x105/0x220 fs/fs_context.c:592
   vfs_get_tree+0x89/0x2f0 fs/super.c:1547
   fc_mount fs/namespace.c:978 [inline]
   vfs_kern_mount.part.0+0xd3/0x170 fs/namespace.c:1008
   vfs_kern_mount+0x3c/0x60 fs/namespace.c:995
   btrfs_mount+0x234/0xaa0 fs/btrfs/super.c:1732
   legacy_get_tree+0x105/0x220 fs/fs_context.c:592
   vfs_get_tree+0x89/0x2f0 fs/super.c:1547
   do_new_mount fs/namespace.c:2875 [inline]
   path_mount+0x1387/0x2070 fs/namespace.c:3192
   do_mount fs/namespace.c:3205 [inline]
   __do_sys_mount fs/namespace.c:3413 [inline]
   __se_sys_mount fs/namespace.c:3390 [inline]
   __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x46004a
Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd 89 fb ff c3 66 2e 0f 1f 84 00 
00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 da 89 
fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7f414d78da88 EFLAGS: 0202 ORIG_RAX: 00a5
RAX: ffda RBX: 7f414d78db20 RCX: 0046004a
RDX: 2000 RSI: 2100 RDI: 7f414d78dae0
RBP: 7f414d78dae0 R08: 7f414d78db20 R09: 2000
R10:  R11: 0202 R12: 2000
R13: 2100 R14: 2200 R15: 2001a800
Kernel Offset: disabled
Rebooting in 86400 seconds..



#syz fix: btrfs: fix rw_devices count in __btrfs_free_extra_devids


Is it the correct patch title? It still does not exist anywhere
including linux-next...



V2 is here [1]. The patch title is changed. Also the cover letter talks 
about the path dependency. It is not yet integrated.


[1]
https://patchwork.kernel.org/project/linux-btrfs/patch/d0b5790792b8b826504dd239ad9efc514f3d9109.1604009248.git.anand.j...@oracle.com/


Thanks, Anand




[PATCH stable-5.4] backport enospc issues during balance

2020-10-13 Thread Anand Jain
Patch 1 is a preparatory patch to reduce conflicts. Patch 2 fixes
balance failure due to ENOSPC in btrfs/156 on arm64 systems with
pagesize=64k. Minor conflicts in fs/btrfs/block-group.c are resolved.
Thanks.

Josef Bacik (2):
  btrfs: don't pass system_chunk into can_overcommit
  btrfs: take overcommit into account in inc_block_group_ro

 fs/btrfs/block-group.c | 38 ++--
 fs/btrfs/space-info.c  | 50 +-
 fs/btrfs/space-info.h  |  3 +++
 3 files changed, 49 insertions(+), 42 deletions(-)

-- 
2.18.4



[PATCH stable-5.4 1/2] btrfs: don't pass system_chunk into can_overcommit

2020-10-13 Thread Anand Jain
From: Josef Bacik 

commit 9f246926b4d5db4c5e8c78e4897757de26c95be6 upstream

We have the space_info, we can just check its flags to see if it's the
system chunk space info.

Reviewed-by: Nikolay Borisov 
Reviewed-by: Qu Wenruo 
Reviewed-by: Johannes Thumshirn 
Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba 
Signed-off-by: Anand Jain 
---
 fs/btrfs/space-info.c | 42 +++---
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 6f484f0d347e..e19e538d05f9 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -162,8 +162,7 @@ static inline u64 calc_global_rsv_need_space(struct 
btrfs_block_rsv *global)
 
 static int can_overcommit(struct btrfs_fs_info *fs_info,
  struct btrfs_space_info *space_info, u64 bytes,
- enum btrfs_reserve_flush_enum flush,
- bool system_chunk)
+ enum btrfs_reserve_flush_enum flush)
 {
u64 profile;
u64 avail;
@@ -174,7 +173,7 @@ static int can_overcommit(struct btrfs_fs_info *fs_info,
if (space_info->flags & BTRFS_BLOCK_GROUP_DATA)
return 0;
 
-   if (system_chunk)
+   if (space_info->flags & BTRFS_BLOCK_GROUP_SYSTEM)
profile = btrfs_system_alloc_profile(fs_info);
else
profile = btrfs_metadata_alloc_profile(fs_info);
@@ -228,8 +227,7 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info 
*fs_info,
 
/* Check and see if our ticket can be satisified now. */
if ((used + ticket->bytes <= space_info->total_bytes) ||
-   can_overcommit(fs_info, space_info, ticket->bytes, flush,
-  false)) {
+   can_overcommit(fs_info, space_info, ticket->bytes, flush)) {
btrfs_space_info_update_bytes_may_use(fs_info,
  space_info,
  ticket->bytes);
@@ -634,8 +632,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 
 static inline u64
 btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info,
-struct btrfs_space_info *space_info,
-bool system_chunk)
+struct btrfs_space_info *space_info)
 {
struct reserve_ticket *ticket;
u64 used;
@@ -651,13 +648,12 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info 
*fs_info,
 
to_reclaim = min_t(u64, num_online_cpus() * SZ_1M, SZ_16M);
if (can_overcommit(fs_info, space_info, to_reclaim,
-  BTRFS_RESERVE_FLUSH_ALL, system_chunk))
+  BTRFS_RESERVE_FLUSH_ALL))
return 0;
 
used = btrfs_space_info_used(space_info, true);
 
-   if (can_overcommit(fs_info, space_info, SZ_1M,
-  BTRFS_RESERVE_FLUSH_ALL, system_chunk))
+   if (can_overcommit(fs_info, space_info, SZ_1M, BTRFS_RESERVE_FLUSH_ALL))
expected = div_factor_fine(space_info->total_bytes, 95);
else
expected = div_factor_fine(space_info->total_bytes, 90);
@@ -673,7 +669,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info 
*fs_info,
 
 static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info,
struct btrfs_space_info *space_info,
-   u64 used, bool system_chunk)
+   u64 used)
 {
u64 thresh = div_factor_fine(space_info->total_bytes, 98);
 
@@ -681,8 +677,7 @@ static inline int need_do_async_reclaim(struct 
btrfs_fs_info *fs_info,
if ((space_info->bytes_used + space_info->bytes_reserved) >= thresh)
return 0;
 
-   if (!btrfs_calc_reclaim_metadata_size(fs_info, space_info,
- system_chunk))
+   if (!btrfs_calc_reclaim_metadata_size(fs_info, space_info))
return 0;
 
return (used >= thresh && !btrfs_fs_closing(fs_info) &&
@@ -805,8 +800,7 @@ static void btrfs_async_reclaim_metadata_space(struct 
work_struct *work)
space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
 
spin_lock(_info->lock);
-   to_reclaim = btrfs_calc_reclaim_metadata_size(fs_info, space_info,
- false);
+   to_reclaim = btrfs_calc_reclaim_metadata_size(fs_info, space_info);
if (!to_reclaim) {
space_info->flush = 0;
spin_unlock(_info->lock);
@@ -825,8 +819,7 @@ static void btrfs_async_reclaim_metadata_space(struct 
work_struct *work)
return;
 

[PATCH stable-5.4 2/2] btrfs: take overcommit into account in inc_block_group_ro

2020-10-13 Thread Anand Jain
From: Josef Bacik 

commit a30a3d2067536cbcce26c055e70cc3a6ae4fd45c upstream

inc_block_group_ro does a calculation to see if we have enough room left
over if we mark this block group as read only in order to see if it's ok
to mark the block group as read only.

The problem is this calculation _only_ works for data, where our used is
always less than our total.  For metadata we will overcommit, so this
will almost always fail for metadata.

Fix this by exporting btrfs_can_overcommit, and then see if we have
enough space to remove the remaining free space in the block group we
are trying to mark read only.  If we do then we can mark this block
group as read only.

Reviewed-by: Qu Wenruo 
Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba 
Signed-off-by: Anand Jain 
---
 fs/btrfs/block-group.c | 38 ++
 fs/btrfs/space-info.c  | 18 ++
 fs/btrfs/space-info.h  |  3 +++
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index b167649f5f5d..ace49a999ece 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1186,7 +1186,6 @@ static int inc_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
 {
struct btrfs_space_info *sinfo = cache->space_info;
u64 num_bytes;
-   u64 sinfo_used;
u64 min_allocable_bytes;
int ret = -ENOSPC;
 
@@ -1213,20 +1212,38 @@ static int inc_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
 
num_bytes = cache->key.offset - cache->reserved - cache->pinned -
cache->bytes_super - btrfs_block_group_used(>item);
-   sinfo_used = btrfs_space_info_used(sinfo, true);
 
/*
-* sinfo_used + num_bytes should always <= sinfo->total_bytes.
-*
-* Here we make sure if we mark this bg RO, we still have enough
-* free space as buffer (if min_allocable_bytes is not 0).
+* Data never overcommits, even in mixed mode, so do just the straight
+* check of left over space in how much we have allocated.
 */
-   if (sinfo_used + num_bytes + min_allocable_bytes <=
-   sinfo->total_bytes) {
+   if (force) {
+   ret = 0;
+   } else if (sinfo->flags & BTRFS_BLOCK_GROUP_DATA) {
+   u64 sinfo_used = btrfs_space_info_used(sinfo, true);
+
+   /*
+* Here we make sure if we mark this bg RO, we still have enough
+* free space as buffer.
+*/
+   if (sinfo_used + num_bytes <= sinfo->total_bytes)
+   ret = 0;
+   } else {
+   /*
+* We overcommit metadata, so we need to do the
+* btrfs_can_overcommit check here, and we need to pass in
+* BTRFS_RESERVE_NO_FLUSH to give ourselves the most amount of
+* leeway to allow us to mark this block group as read only.
+*/
+   if (btrfs_can_overcommit(cache->fs_info, sinfo, num_bytes,
+BTRFS_RESERVE_NO_FLUSH))
+   ret = 0;
+   }
+
+   if (!ret) {
sinfo->bytes_readonly += num_bytes;
cache->ro++;
list_add_tail(>ro_list, >ro_bgs);
-   ret = 0;
}
 out:
spin_unlock(>lock);
@@ -1235,9 +1252,6 @@ static int inc_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
btrfs_info(cache->fs_info,
"unable to make block group %llu ro",
cache->key.objectid);
-   btrfs_info(cache->fs_info,
-   "sinfo_used=%llu bg_num_bytes=%llu min_allocable=%llu",
-   sinfo_used, num_bytes, min_allocable_bytes);
btrfs_dump_space_info(cache->fs_info, cache->space_info, 0, 0);
}
return ret;
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index e19e538d05f9..90500b6c41fc 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -160,9 +160,9 @@ static inline u64 calc_global_rsv_need_space(struct 
btrfs_block_rsv *global)
return (global->size << 1);
 }
 
-static int can_overcommit(struct btrfs_fs_info *fs_info,
- struct btrfs_space_info *space_info, u64 bytes,
- enum btrfs_reserve_flush_enum flush)
+int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
+struct btrfs_space_info *space_info, u64 bytes,
+enum btrfs_reserve_flush_enum flush)
 {
u64 profile;
u64 avail;
@@ -227,7 +227,8 @@ void btrfs_try_granting_tickets(struct btrfs_fs_info 
*fs_info,
 
/* Check and see if our ticket can be satisified now. */
  

Re: [PATCH v2 add prerequisite-patch-id] btrfs: fix devid 0 without a replace item by failing the mount

2020-10-11 Thread Anand Jain



Accidentally this patch went to the stable. Pls, ignore. Sorry for the 
noise. Thanks.



On 12/10/20 1:26 pm, Anand Jain wrote:

If there is a BTRFS_DEV_REPLACE_DEVID without a replace item, then
it means some device is trying to attack or may be corrupted. Fail the
mount so that the user can remove the attacking or fix the corrupted
device.

As of now if BTRFS_DEV_REPLACE_DEVID is present without the replace
item, in __btrfs_free_extra_devids() we determine that there is an
extra device, and free those extra devices but continue to mount the
device.
However, we were wrong in keeping tack of the rw_devices so the syzbot
testcase failed as below [1].

[1]
WARNING: CPU: 1 PID: 3612 at fs/btrfs/volumes.c:1166 
close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 3612 Comm: syz-executor.2 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x198/0x1fd lib/dump_stack.c:118
  panic+0x347/0x7c0 kernel/panic.c:231
  __warn.cold+0x20/0x46 kernel/panic.c:600
  report_bug+0x1bd/0x210 lib/bug.c:198
  handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234
  exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254
  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
Code: 0f b6 04 02 84 c0 74 02 7e 33 48 8b 44 24 18 c6 80 30 01 00 00 00 48 83 c4 30 
5b 5d 41 5c 41 5d 41 5e 41 5f c3 e8 99 ce 6a fe <0f> 0b e9 71 ff ff ff e8 8d ce 
6a fe 0f 0b e9 20 ff ff ff e8 d1 d5
RSP: 0018:c900091777e0 EFLAGS: 00010246
RAX: 0004 RBX:  RCX: c9000c8b7000
RDX: 0004 RSI: 83097f47 RDI: 0007
RBP: dc00 R08: 0001 R09: 8880988a187f
R10:  R11: 0001 R12: 88809593a130
R13: 88809593a1ec R14: 8880988a1908 R15: 88809593a050
  close_fs_devices fs/btrfs/volumes.c:1193 [inline]
  btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
  open_ctree+0x4984/0x4a2d fs/btrfs/disk-io.c:3434
  btrfs_fill_super fs/btrfs/super.c:1316 [inline]
  btrfs_mount_root.cold+0x14/0x165 fs/btrfs/super.c:1672

The fix here is, when we determine that there isn't a replace item
then fail the mount if there is a replace target device (devid 0).

Reported-by: syzbot+4cfe71a4da060be47...@syzkaller.appspotmail.com
Signed-off-by: Anand Jain 
---
Depends on the patches
  btrfs: drop never met condition of disk_total_bytes == 0
  btrfs: fix btrfs_find_device unused arg seed
If these patches aren't integrated yet, then please add the last arg in
the function btrfs_find_device(). Any value is fine as it doesn't care.

fstest case will follow.

v2: changed title
 old: btrfs: fix rw_devices count in __btrfs_free_extra_devids

 In btrfs_init_dev_replace() try to match the presence of replace_item
 to the BTRFS_DEV_REPLACE_DEVID device. If fails then fail the
 mount. So drop the similar check in __btrfs_free_extra_devids().

  fs/btrfs/dev-replace.c | 26 --
  fs/btrfs/volumes.c | 26 +++---
  2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 9340d03661cd..e2b7ae386224 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -91,6 +91,17 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
ret = btrfs_search_slot(NULL, dev_root, , path, 0, 0);
if (ret) {
  no_valid_dev_replace_entry_found:
+   /*
+* We don't have a replace item or it's corrupted.
+* If there is a replace target, fail the mount.
+*/
+   if (btrfs_find_device(fs_info->fs_devices,
+ BTRFS_DEV_REPLACE_DEVID, NULL, NULL)) {
+   btrfs_err(fs_info,
+   "found replace target device without a replace item");
+   ret = -EIO;
+   goto out;
+   }
ret = 0;
dev_replace->replace_state =
BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED;
@@ -143,8 +154,19 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED:
-   dev_replace->srcdev = NULL;
-   dev_replace->tgtdev = NULL;
+   /*
+* We don't have an active replace item but if there is a
+* replace target, fail the mount.
+*/
+   if (btrfs_find_device(fs_info->fs_devices,
+ BTRFS_DEV_REPLACE_DEVID, NULL, NULL)) {
+ 

[PATCH v2 add prerequisite-patch-id] btrfs: fix devid 0 without a replace item by failing the mount

2020-10-11 Thread Anand Jain
If there is a BTRFS_DEV_REPLACE_DEVID without a replace item, then
it means some device is trying to attack or may be corrupted. Fail the
mount so that the user can remove the attacking or fix the corrupted
device.

As of now if BTRFS_DEV_REPLACE_DEVID is present without the replace
item, in __btrfs_free_extra_devids() we determine that there is an
extra device, and free those extra devices but continue to mount the
device.
However, we were wrong in keeping tack of the rw_devices so the syzbot
testcase failed as below [1].

[1]
WARNING: CPU: 1 PID: 3612 at fs/btrfs/volumes.c:1166 
close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 3612 Comm: syz-executor.2 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 panic+0x347/0x7c0 kernel/panic.c:231
 __warn.cold+0x20/0x46 kernel/panic.c:600
 report_bug+0x1bd/0x210 lib/bug.c:198
 handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234
 exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
Code: 0f b6 04 02 84 c0 74 02 7e 33 48 8b 44 24 18 c6 80 30 01 00 00 00 48 83 
c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 e8 99 ce 6a fe <0f> 0b e9 71 ff ff ff e8 
8d ce 6a fe 0f 0b e9 20 ff ff ff e8 d1 d5
RSP: 0018:c900091777e0 EFLAGS: 00010246
RAX: 0004 RBX:  RCX: c9000c8b7000
RDX: 0004 RSI: 83097f47 RDI: 0007
RBP: dc00 R08: 0001 R09: 8880988a187f
R10:  R11: 0001 R12: 88809593a130
R13: 88809593a1ec R14: 8880988a1908 R15: 88809593a050
 close_fs_devices fs/btrfs/volumes.c:1193 [inline]
 btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
 open_ctree+0x4984/0x4a2d fs/btrfs/disk-io.c:3434
 btrfs_fill_super fs/btrfs/super.c:1316 [inline]
 btrfs_mount_root.cold+0x14/0x165 fs/btrfs/super.c:1672

The fix here is, when we determine that there isn't a replace item
then fail the mount if there is a replace target device (devid 0).

Reported-by: syzbot+4cfe71a4da060be47...@syzkaller.appspotmail.com
Signed-off-by: Anand Jain 
---
Depends on the patches
 btrfs: drop never met condition of disk_total_bytes == 0
 btrfs: fix btrfs_find_device unused arg seed
If these patches aren't integrated yet, then please add the last arg in
the function btrfs_find_device(). Any value is fine as it doesn't care.

fstest case will follow.

v2: changed title
old: btrfs: fix rw_devices count in __btrfs_free_extra_devids

In btrfs_init_dev_replace() try to match the presence of replace_item
to the BTRFS_DEV_REPLACE_DEVID device. If fails then fail the
mount. So drop the similar check in __btrfs_free_extra_devids().

 fs/btrfs/dev-replace.c | 26 --
 fs/btrfs/volumes.c | 26 +++---
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 9340d03661cd..e2b7ae386224 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -91,6 +91,17 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
ret = btrfs_search_slot(NULL, dev_root, , path, 0, 0);
if (ret) {
 no_valid_dev_replace_entry_found:
+   /*
+* We don't have a replace item or it's corrupted.
+* If there is a replace target, fail the mount.
+*/
+   if (btrfs_find_device(fs_info->fs_devices,
+ BTRFS_DEV_REPLACE_DEVID, NULL, NULL)) {
+   btrfs_err(fs_info,
+   "found replace target device without a replace item");
+   ret = -EIO;
+   goto out;
+   }
ret = 0;
dev_replace->replace_state =
BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED;
@@ -143,8 +154,19 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED:
-   dev_replace->srcdev = NULL;
-   dev_replace->tgtdev = NULL;
+   /*
+* We don't have an active replace item but if there is a
+* replace target, fail the mount.
+*/
+   if (btrfs_find_device(fs_info->fs_devices,
+ BTRFS_DEV_REPLACE_DEVID, NULL, NULL)) {
+   btrfs_err(fs_info,
+   "replace devid present without an active replace item");
+

[PATCH stable-5.4 4/6] btrfs: Ensure we trim ranges across block group boundary

2020-10-08 Thread Anand Jain
From: Qu Wenruo 

commit 6b7faadd985c990324b5b5bd18cc4ba5c395eb65 upstream.

[BUG]
When deleting large files (which cross block group boundary) with
discard mount option, we find some btrfs_discard_extent() calls only
trimmed part of its space, not the whole range:

  btrfs_discard_extent: type=0x1 start=19626196992 len=2144530432 
trimmed=1073741824 ratio=50%

type:   bbio->map_type, in above case, it's SINGLE DATA.
start:  Logical address of this trim
len:Logical length of this trim
trimmed:Physically trimmed bytes
ratio:  trimmed / len

Thus leaving some unused space not discarded.

[CAUSE]
When discard mount option is specified, after a transaction is fully
committed (super block written to disk), we begin to cleanup pinned
extents in the following call chain:

btrfs_commit_transaction()
|- btrfs_finish_extent_commit()
   |- find_first_extent_bit(unpin, 0, , , EXTENT_DIRTY);
   |- btrfs_discard_extent()

However, pinned extents are recorded in an extent_io_tree, which can
merge adjacent extent states.

When a large file gets deleted and it has adjacent file extents across
block group boundary, we will get a large merged range like this:

  |<---BG1--->|<---  BG2 --->|
  |//|<--   Range to discard   --->|/|

To discard that range, we have the following calls:

  btrfs_discard_extent()
  |- btrfs_map_block()
  |  Returned bbio will end at BG1's end. As btrfs_map_block()
  |  never returns result across block group boundary.
  |- btrfs_issuse_discard()
 Issue discard for each stripe.

So we will only discard the range in BG1, not the remaining part in BG2.

Furthermore, this bug is not that reliably observed, for above case, if
there is no other extent in BG2, BG2 will be empty and btrfs will trim
all space of BG2, covering up the bug.

[FIX]
- Allow __btrfs_map_block_for_discard() to modify @length parameter
  btrfs_map_block() uses its @length paramter to notify the caller how
  many bytes are mapped in current call.
  With __btrfs_map_block_for_discard() also modifing the @length,
  btrfs_discard_extent() now understands when to do extra trim.

- Call btrfs_map_block() in a loop until we hit the range end Since we
  now know how many bytes are mapped each time, we can iterate through
  each block group boundary and issue correct trim for each range.

Reviewed-by: Filipe Manana 
Reviewed-by: Nikolay Borisov 
Tested-by: Nikolay Borisov 
Reviewed-by: Josef Bacik 
Signed-off-by: Qu Wenruo 
Signed-off-by: David Sterba 
Signed-off-by: Anand Jain 
---
 fs/btrfs/extent-tree.c | 41 +++--
 fs/btrfs/volumes.c |  6 --
 2 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ef05cbacef73..a546212594e8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1306,8 +1306,10 @@ static int btrfs_issue_discard(struct block_device 
*bdev, u64 start, u64 len,
 int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
 u64 num_bytes, u64 *actual_bytes)
 {
-   int ret;
+   int ret = 0;
u64 discarded_bytes = 0;
+   u64 end = bytenr + num_bytes;
+   u64 cur = bytenr;
struct btrfs_bio *bbio = NULL;
 
 
@@ -1316,15 +1318,23 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, 
u64 bytenr,
 * associated to its stripes that don't go away while we are discarding.
 */
btrfs_bio_counter_inc_blocked(fs_info);
-   /* Tell the block device(s) that the sectors can be discarded */
-   ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, bytenr, _bytes,
- , 0);
-   /* Error condition is -ENOMEM */
-   if (!ret) {
-   struct btrfs_bio_stripe *stripe = bbio->stripes;
+   while (cur < end) {
+   struct btrfs_bio_stripe *stripe;
int i;
 
+   num_bytes = end - cur;
+   /* Tell the block device(s) that the sectors can be discarded */
+   ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, cur,
+ _bytes, , 0);
+   /*
+* Error can be -ENOMEM, -ENOENT (no such chunk mapping) or
+* -EOPNOTSUPP. For any such error, @num_bytes is not updated,
+* thus we can't continue anyway.
+*/
+   if (ret < 0)
+   goto out;
 
+   stripe = bbio->stripes;
for (i = 0; i < bbio->num_stripes; i++, stripe++) {
u64 bytes;
struct request_queue *req_q;
@@ -1341,10 +1351,19 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, 
u64 bytenr,
  stripe->physical,
  

[PATCH stable-5.4 0/6] backport fixes for xfstests btrfs/199,200,203,204

2020-10-08 Thread Anand Jain
This series backports fixes for the xfstests test cases btrfs/199
btrfs/200 btrfs/203 and btrfs/204.

patch 1 is fix for btrfs/200
patch 2 fixes regression in patch 1 and is fix for btrfs/203 
patch 3 helps to fix conflicts in patch 4
patch 4 is fix for btrfs/199
patch 5 helps to avoid conflicts in patch6
patch 6 is fix for btrfs/204

patch1 Btrfs: send, allow clone operations within the same file
patch2 Btrfs: send, fix emission of invalid clone operations within the same 
file
patch3 btrfs: volumes: Use more straightforward way to calculate map length
patch4 btrfs: Ensure we trim ranges across block group boundary
patch5 btrfs: fix RWF_NOWAIT write not failling when we need to cow
patch6 btrfs: allow btrfs_truncate_block() to fallback to nocow for data space 
reservation

Filipe Manana (3):
  Btrfs: send, allow clone operations within the same file
  Btrfs: send, fix emission of invalid clone operations within the same
file
  btrfs: fix RWF_NOWAIT write not failling when we need to cow

Qu Wenruo (3):
  btrfs: volumes: Use more straightforward way to calculate map length
  btrfs: Ensure we trim ranges across block group boundary
  btrfs: allow btrfs_truncate_block() to fallback to nocow for data
space reservation

 fs/btrfs/ctree.h   |  2 ++
 fs/btrfs/extent-tree.c | 41 +--
 fs/btrfs/file.c| 23 ++
 fs/btrfs/inode.c   | 44 +++---
 fs/btrfs/send.c| 19 +-
 fs/btrfs/volumes.c |  8 +---
 6 files changed, 108 insertions(+), 29 deletions(-)

-- 
2.25.1



[PATCH stable-5.4 5/6] btrfs: fix RWF_NOWAIT write not failling when we need to cow

2020-10-08 Thread Anand Jain
From: Filipe Manana 

commit 260a63395f90f67d6ab89e4266af9e3dc34a77e9 upstream.

If we attempt to do a RWF_NOWAIT write against a file range for which we
can only do NOCOW for a part of it, due to the existence of holes or
shared extents for example, we proceed with the write as if it were
possible to NOCOW the whole range.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ touch /mnt/sdj/bar
  $ chattr +C /mnt/sdj/bar

  $ xfs_io -d -c "pwrite -S 0xab -b 256K 0 256K" /mnt/bar
  wrote 262144/262144 bytes at offset 0
  256 KiB, 1 ops; 0.0003 sec (694.444 MiB/sec and 2777.7778 ops/sec)

  $ xfs_io -c "fpunch 64K 64K" /mnt/bar
  $ sync

  $ xfs_io -d -c "pwrite -N -V 1 -b 128K -S 0xfe 0 128K" /mnt/bar
  wrote 131072/131072 bytes at offset 0
  128 KiB, 1 ops; 0.0007 sec (160.051 MiB/sec and 1280.4097 ops/sec)

This last write should fail with -EAGAIN since the file range from 64K to
128K is a hole. On xfs it fails, as expected, but on ext4 it currently
succeeds because apparently it is expensive to check if there are extents
allocated for the whole range, but I'll check with the ext4 people.

Fix the issue by checking if check_can_nocow() returns a number of
NOCOW'able bytes smaller then the requested number of bytes, and if it
does return -EAGAIN.

Fixes: edf064e7c6fec3 ("btrfs: nowait aio support")
CC: sta...@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba 
Signed-off-by: Anand Jain 
---
 fs/btrfs/file.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4e4ddd5629e5..4d0e1f9452a5 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1919,13 +1919,27 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
pos = iocb->ki_pos;
count = iov_iter_count(from);
if (iocb->ki_flags & IOCB_NOWAIT) {
+   size_t nocow_bytes = count;
+
/*
 * We will allocate space in case nodatacow is not set,
 * so bail
 */
if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
  BTRFS_INODE_PREALLOC)) ||
-   check_can_nocow(BTRFS_I(inode), pos, ) <= 0) {
+   check_can_nocow(BTRFS_I(inode), pos, _bytes) <= 0) {
+   inode_unlock(inode);
+   return -EAGAIN;
+   }
+
+   /* check_can_nocow() locks the snapshot lock on success */
+   btrfs_end_write_no_snapshotting(root);
+   /*
+* There are holes in the range or parts of the range that must
+* be COWed (shared extents, RO block groups, etc), so just bail
+* out.
+*/
+   if (nocow_bytes < count) {
inode_unlock(inode);
return -EAGAIN;
}
-- 
2.25.1



[PATCH stable-5.4 6/6] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation

2020-10-08 Thread Anand Jain
From: Qu Wenruo 

commit 6d4572a9d71d5fc2affee0258d8582d39859188c upstream.

[BUG]
When the data space is exhausted, even if the inode has NOCOW attribute,
we will still refuse to truncate unaligned range due to ENOSPC.

The following script can reproduce it pretty easily:
  #!/bin/bash

  dev=/dev/test/test
  mnt=/mnt/btrfs

  umount $dev &> /dev/null
  umount $mnt &> /dev/null

  mkfs.btrfs -f $dev -b 1G
  mount -o nospace_cache $dev $mnt
  touch $mnt/foobar
  chattr +C $mnt/foobar

  xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
  xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
  sync

  xfs_io -c "fpunch 0 2k" $mnt/foobar
  umount $mnt

Currently this will fail at the fpunch part.

[CAUSE]
Because btrfs_truncate_block() always reserves space without checking
the NOCOW attribute.

Since the writeback path follows NOCOW bit, we only need to bother the
space reservation code in btrfs_truncate_block().

[FIX]
Make btrfs_truncate_block() follow btrfs_buffered_write() to try to
reserve data space first, and fall back to NOCOW check only when we
don't have enough space.

Such always-try-reserve is an optimization introduced in
btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.

This patch will export check_can_nocow() as btrfs_check_can_nocow(), and
use it in btrfs_truncate_block() to fix the problem.

Reported-by: Martin Doucha 
Reviewed-by: Filipe Manana 
Reviewed-by: Anand Jain 
Signed-off-by: Qu Wenruo 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba 
Signed-off-by: Anand Jain 
 Conflicts:
fs/btrfs/ctree.h
fs/btrfs/file.c
fs/btrfs/inode.c
---
 fs/btrfs/ctree.h |  2 ++
 fs/btrfs/file.c  |  9 +
 fs/btrfs/inode.c | 44 +---
 3 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9a690c10afaa..23b4f38e2392 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2956,6 +2956,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t 
start, loff_t end);
 loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
  struct file *file_out, loff_t pos_out,
  loff_t len, unsigned int remap_flags);
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+ size_t *write_bytes);
 
 /* tree-defrag.c */
 int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4d0e1f9452a5..4126513e2429 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1546,8 +1546,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode 
*inode, struct page **pages,
return ret;
 }
 
-static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
-   size_t *write_bytes)
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+ size_t *write_bytes)
 {
struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct btrfs_root *root = inode->root;
@@ -1647,7 +1647,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb 
*iocb,
if (ret < 0) {
if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
  BTRFS_INODE_PREALLOC)) &&
-   check_can_nocow(BTRFS_I(inode), pos,
+   btrfs_check_can_nocow(BTRFS_I(inode), pos,
_bytes) > 0) {
/*
 * For nodata cow case, no need to reserve
@@ -1927,7 +1927,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 */
if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
  BTRFS_INODE_PREALLOC)) ||
-   check_can_nocow(BTRFS_I(inode), pos, _bytes) <= 0) {
+   btrfs_check_can_nocow(BTRFS_I(inode), pos,
+ _bytes) <= 0) {
inode_unlock(inode);
return -EAGAIN;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9ac40991a640..2356b28afebe 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5133,11 +5133,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t 
from, loff_t len,
struct extent_state *cached_state = NULL;
struct extent_changeset *data_reserved = NULL;
char *kaddr;
+   bool only_release_metadata = false;
u32 blocksize = fs_info->sectorsize;
pgoff_t index = from >> PAGE_SHIFT;
unsigned offset = from & (blocksize - 1);
struct page *page;
gfp_t mask = btrfs_alloc_write_mask(mapping);
+   size_t write_bytes = blocksize;
int ret = 0;
  

[PATCH stable-5.4 2/6] Btrfs: send, fix emission of invalid clone operations within the same file

2020-10-08 Thread Anand Jain
From: Filipe Manana 

commit 9722b10148504c4153a74a9c89725af271e490fc upstream.

When doing an incremental send and a file has extents shared with itself
at different file offsets, it's possible for send to emit clone operations
that will fail at the destination because the source range goes beyond the
file's current size. This happens when the file size has increased in the
send snapshot, there is a hole between the shared extents and both shared
extents are at file offsets which are greater the file's size in the
parent snapshot.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt/sdb

  $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar
  $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
  $ btrfs send -f /tmp/1.snap /mnt/sdb/base

  # Create a 320K extent at file offset 512K.
  $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar

  # Clone part of that 320K extent into a lower file offset (192K).
  # This file offset is greater than the file's size in the parent
  # snapshot (64K). Also the clone range is a bit behind the offset of
  # the 320K extent so that we leave a hole between the shared extents.
  $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar

  $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
  $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt/sdc

  $ btrfs receive -f /tmp/1.snap /mnt/sdc
  $ btrfs receive -f /tmp/2.snap /mnt/sdc
  ERROR: failed to clone extents to foobar: Invalid argument

The problem is that after processing the extent at file offset 256K, which
refers to the first 128K of the 320K extent created by the buffered write
operations, we have 'cur_inode_next_write_offset' set to 384K, which
corresponds to the end offset of the partially shared extent (256K + 128K)
and to the current file size in the receiver. Then when we process the
extent at offset 512K, we do extent backreference iteration to figure out
if we can clone the extent from some other inode or from the same inode,
and we consider the extent at offset 256K of the same inode as a valid
source for a clone operation, which is not correct because at that point
the current file size in the receiver is 384K, which corresponds to the
end of last processed extent (at file offset 256K), so using a clone
source range from 256K to 256K + 320K is invalid because that goes past
the current size of the file (384K) - this makes the receiver get an
-EINVAL error when attempting the clone operation.

So fix this by excluding clone sources that have a range that goes beyond
the current file size in the receiver when iterating extent backreferences.

A test case for fstests follows soon.

Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same 
file")
CC: sta...@vger.kernel.org # 5.5+
Reviewed-by: Josef Bacik 
Signed-off-by: Filipe Manana 
Signed-off-by: David Sterba 
Signed-off-by: Anand Jain 
---
 fs/btrfs/send.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 48d028b1cfaf..b0e5dfb9be7a 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1270,7 +1270,8 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 
root, void *ctx_)
 * destination of the stream.
 */
if (ino == bctx->cur_objectid &&
-   offset >= bctx->sctx->cur_inode_next_write_offset)
+   offset + bctx->extent_len >
+   bctx->sctx->cur_inode_next_write_offset)
return 0;
}
 
-- 
2.25.1



[PATCH stable-5.4 3/6] btrfs: volumes: Use more straightforward way to calculate map length

2020-10-08 Thread Anand Jain
From: Qu Wenruo 

commit 2d974619a77f106f3d1341686dea95c0eaad601f upstream.

The old code goes:

offset = logical - em->start;
length = min_t(u64, em->len - offset, length);

Where @length calculation is dependent on offset, it can take reader
several more seconds to find it's just the same code as:

offset = logical - em->start;
length = min_t(u64, em->start + em->len - logical, length);

Use above code to make the length calculate independent from other
variable, thus slightly increase the readability.

Reviewed-by: Johannes Thumshirn 
Reviewed-by: Josef Bacik 
Signed-off-by: Qu Wenruo 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba 
Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 196ddbcd2936..1dba45d43485 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5712,7 +5712,7 @@ static int __btrfs_map_block_for_discard(struct 
btrfs_fs_info *fs_info,
}
 
offset = logical - em->start;
-   length = min_t(u64, em->len - offset, length);
+   length = min_t(u64, em->start + em->len - logical, length);
 
stripe_len = map->stripe_len;
/*
-- 
2.25.1



[PATCH stable-5.4 1/6] Btrfs: send, allow clone operations within the same file

2020-10-08 Thread Anand Jain
From: Filipe Manana 

commit 11f2069c113e02971b8db6fda62f9b9cd31a030f upstream.

For send we currently skip clone operations when the source and
destination files are the same. This is so because clone didn't support
this case in its early days, but support for it was added back in May
2013 by commit a96fbc72884fcb ("Btrfs: allow file data clone within a
file"). This change adds support for it.

Example:

  $ mkfs.btrfs -f /dev/sdd
  $ mount /dev/sdd /mnt/sdd

  $ xfs_io -f -c "pwrite -S 0xab -b 64K 0 64K" /mnt/sdd/foobar
  $ xfs_io -c "reflink /mnt/sdd/foobar 0 64K 64K" /mnt/sdd/foobar

  $ btrfs subvolume snapshot -r /mnt/sdd /mnt/sdd/snap

  $ mkfs.btrfs -f /dev/sde
  $ mount /dev/sde /mnt/sde

  $ btrfs send /mnt/sdd/snap | btrfs receive /mnt/sde

Without this change file foobar at the destination has a single 128Kb
extent:

  $ filefrag -v /mnt/sde/snap/foobar
  Filesystem type is: 9123683e
  File size of /mnt/sde/snap/foobar is 131072 (32 blocks of 4096 bytes)
   ext: logical_offset:physical_offset: length:   expected: flags:
 0:0..  31:  0..31: 32: 
last,unknown_loc,delalloc,eof
  /mnt/sde/snap/foobar: 1 extent found

With this we get a single 64Kb extent that is shared at file offsets 0
and 64K, just like in the source filesystem:

  $ filefrag -v /mnt/sde/snap/foobar
  Filesystem type is: 9123683e
  File size of /mnt/sde/snap/foobar is 131072 (32 blocks of 4096 bytes)
   ext: logical_offset:physical_offset: length:   expected: flags:
 0:0..  15:   3328..  3343: 16: shared
 1:   16..  31:   3328..  3343: 16:   3344: 
last,shared,eof
  /mnt/sde/snap/foobar: 2 extents found

Reviewed-by: Josef Bacik 
Signed-off-by: Filipe Manana 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba 
Signed-off-by: Anand Jain 
---
 fs/btrfs/send.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 6ad216e8178e..48d028b1cfaf 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1257,12 +1257,20 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 
root, void *ctx_)
 */
if (found->root == bctx->sctx->send_root) {
/*
-* TODO for the moment we don't accept clones from the inode
-* that is currently send. We may change this when
-* BTRFS_IOC_CLONE_RANGE supports cloning from and to the same
-* file.
+* If the source inode was not yet processed we can't issue a
+* clone operation, as the source extent does not exist yet at
+* the destination of the stream.
 */
-   if (ino >= bctx->cur_objectid)
+   if (ino > bctx->cur_objectid)
+   return 0;
+   /*
+* We clone from the inode currently being sent as long as the
+* source extent is already processed, otherwise we could try
+* to clone from an extent that does not exist yet at the
+* destination of the stream.
+*/
+   if (ino == bctx->cur_objectid &&
+   offset >= bctx->sctx->cur_inode_next_write_offset)
return 0;
}
 
-- 
2.25.1



[PATCH stable-4.14.y] Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space

2020-10-08 Thread Anand Jain
From: Robbie Ko 

[ Upstream commit 8ecebf4d767e2307a946c8905278d6358eda35c3 ]

Commit e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") forced
nocow writes to fallback to COW, during writeback, when a snapshot is
created. This resulted in writes made before creating the snapshot to
unexpectedly fail with ENOSPC during writeback when success (0) was
returned to user space through the write system call.

The steps leading to this problem are:

1. When it's not possible to allocate data space for a write, the
   buffered write path checks if a NOCOW write is possible.  If it is,
   it will not reserve space and success (0) is returned to user space.

2. Then when a snapshot is created, the root's will_be_snapshotted
   atomic is incremented and writeback is triggered for all inode's that
   belong to the root being snapshotted. Incrementing that atomic forces
   all previous writes to fallback to COW during writeback (running
   delalloc).

3. This results in the writeback for the inodes to fail and therefore
   setting the ENOSPC error in their mappings, so that a subsequent
   fsync on them will report the error to user space. So it's not a
   completely silent data loss (since fsync will report ENOSPC) but it's
   a very unexpected and undesirable behaviour, because if a clean
   shutdown/unmount of the filesystem happens without previous calls to
   fsync, it is expected to have the data present in the files after
   mounting the filesystem again.

So fix this by adding a new atomic named snapshot_force_cow to the
root structure which prevents this behaviour and works the following way:

1. It is incremented when we start to create a snapshot after triggering
   writeback and before waiting for writeback to finish.

2. This new atomic is now what is used by writeback (running delalloc)
   to decide whether we need to fallback to COW or not. Because we
   incremented this new atomic after triggering writeback in the
   snapshot creation ioctl, we ensure that all buffered writes that
   happened before snapshot creation will succeed and not fallback to
   COW (which would make them fail with ENOSPC).

3. The existing atomic, will_be_snapshotted, is kept because it is used
   to force new buffered writes, that start after we started
   snapshotting, to reserve data space even when NOCOW is possible.
   This makes these writes fail early with ENOSPC when there's no
   available space to allocate, preventing the unexpected behaviour of
   writeback later failing with ENOSPC due to a fallback to COW mode.

Fixes: e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting")
Signed-off-by: Robbie Ko 
Reviewed-by: Filipe Manana 
Signed-off-by: David Sterba 
Signed-off-by: Anand Jain 
 Conflicts:
fs/btrfs/disk-io.c
---
 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/disk-io.c |  1 +
 fs/btrfs/inode.c   | 25 -
 fs/btrfs/ioctl.c   | 16 
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index de951987fd23..19a668e9164b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1257,6 +1257,7 @@ struct btrfs_root {
int send_in_progress;
struct btrfs_subvolume_writers *subv_writers;
atomic_t will_be_snapshotted;
+   atomic_t snapshot_force_cow;
 
/* For qgroup metadata space reserve */
atomic64_t qgroup_meta_rsv;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 495430e4f84b..ace58d6a270b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1200,6 +1200,7 @@ static void __setup_root(struct btrfs_root *root, struct 
btrfs_fs_info *fs_info,
refcount_set(>refs, 1);
atomic_set(>will_be_snapshotted, 0);
atomic64_set(>qgroup_meta_rsv, 0);
+   atomic_set(>snapshot_force_cow, 0);
root->log_transid = 0;
root->log_transid_committed = -1;
root->last_log_commit = 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c9e7b92d0f21..e985e820724e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1335,7 +1335,7 @@ static noinline int run_delalloc_nocow(struct inode 
*inode,
u64 disk_num_bytes;
u64 ram_bytes;
int extent_type;
-   int ret, err;
+   int ret;
int type;
int nocow;
int check_prev = 1;
@@ -1460,11 +1460,8 @@ static noinline int run_delalloc_nocow(struct inode 
*inode,
 * if there are pending snapshots for this root,
 * we fall into common COW way.
 */
-   if (!nolock) {
-   err = btrfs_start_write_no_snapshotting(root);
-   if (!err)
-   goto out_check;
-   }
+   if (!nolock && atomic_read(>snapshot_force_cow))
+   goto out_check;
/*

Re: WARNING in close_fs_devices (2)

2020-09-22 Thread Anand Jain

On 18/9/20 7:22 pm, syzbot wrote:

Hello,

syzbot found the following issue on:

HEAD commit:e4c26faa Merge tag 'usb-5.9-rc5' of git://git.kernel.org/p..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15bf162190
kernel config:  https://syzkaller.appspot.com/x/.config?x=c61610091f4ca8c4
dashboard link: https://syzkaller.appspot.com/bug?extid=4cfe71a4da060be47502
compiler:   gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+4cfe71a4da060be47...@syzkaller.appspotmail.com

[ cut here ]
WARNING: CPU: 1 PID: 3612 at fs/btrfs/volumes.c:1166 
close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 3612 Comm: syz-executor.2 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x198/0x1fd lib/dump_stack.c:118
  panic+0x347/0x7c0 kernel/panic.c:231
  __warn.cold+0x20/0x46 kernel/panic.c:600
  report_bug+0x1bd/0x210 lib/bug.c:198
  handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234
  exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254
  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
Code: 0f b6 04 02 84 c0 74 02 7e 33 48 8b 44 24 18 c6 80 30 01 00 00 00 48 83 c4 30 
5b 5d 41 5c 41 5d 41 5e 41 5f c3 e8 99 ce 6a fe <0f> 0b e9 71 ff ff ff e8 8d ce 
6a fe 0f 0b e9 20 ff ff ff e8 d1 d5
RSP: 0018:c900091777e0 EFLAGS: 00010246
RAX: 0004 RBX:  RCX: c9000c8b7000
RDX: 0004 RSI: 83097f47 RDI: 0007
RBP: dc00 R08: 0001 R09: 8880988a187f
R10:  R11: 0001 R12: 88809593a130
R13: 88809593a1ec R14: 8880988a1908 R15: 88809593a050
  close_fs_devices fs/btrfs/volumes.c:1193 [inline]
  btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
  open_ctree+0x4984/0x4a2d fs/btrfs/disk-io.c:3434
  btrfs_fill_super fs/btrfs/super.c:1316 [inline]
  btrfs_mount_root.cold+0x14/0x165 fs/btrfs/super.c:1672
  legacy_get_tree+0x105/0x220 fs/fs_context.c:592
  vfs_get_tree+0x89/0x2f0 fs/super.c:1547
  fc_mount fs/namespace.c:978 [inline]
  vfs_kern_mount.part.0+0xd3/0x170 fs/namespace.c:1008
  vfs_kern_mount+0x3c/0x60 fs/namespace.c:995
  btrfs_mount+0x234/0xaa0 fs/btrfs/super.c:1732
  legacy_get_tree+0x105/0x220 fs/fs_context.c:592
  vfs_get_tree+0x89/0x2f0 fs/super.c:1547
  do_new_mount fs/namespace.c:2875 [inline]
  path_mount+0x1387/0x2070 fs/namespace.c:3192
  do_mount fs/namespace.c:3205 [inline]
  __do_sys_mount fs/namespace.c:3413 [inline]
  __se_sys_mount fs/namespace.c:3390 [inline]
  __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x46004a
Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd 89 fb ff c3 66 2e 0f 1f 84 00 
00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 da 89 
fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7f414d78da88 EFLAGS: 0202 ORIG_RAX: 00a5
RAX: ffda RBX: 7f414d78db20 RCX: 0046004a
RDX: 2000 RSI: 2100 RDI: 7f414d78dae0
RBP: 7f414d78dae0 R08: 7f414d78db20 R09: 2000
R10:  R11: 0202 R12: 2000
R13: 2100 R14: 2200 R15: 2001a800
Kernel Offset: disabled
Rebooting in 86400 seconds..



#syz fix: btrfs: fix rw_devices count in __btrfs_free_extra_devids


Re: WARNING in close_fs_devices (2)

2020-09-21 Thread Anand Jain

On 21/9/20 6:42 am, syzbot wrote:

syzbot has found a reproducer for the following issue on:

HEAD commit:b652d2a5 Add linux-next specific files for 20200918
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17e84b0790
kernel config:  https://syzkaller.appspot.com/x/.config?x=3cf0782933432b43
dashboard link: https://syzkaller.appspot.com/bug?extid=4cfe71a4da060be47502
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=112425d990
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1486929b90

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+4cfe71a4da060be47...@syzkaller.appspotmail.com

[ cut here ]
WARNING: CPU: 0 PID: 6972 at fs/btrfs/volumes.c:1172 
close_fs_devices+0x715/0x930 fs/btrfs/volumes.c:1172
Modules linked in:
CPU: 1 PID: 6972 Comm: syz-executor044 Not tainted 
5.9.0-rc5-next-20200918-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:close_fs_devices+0x715/0x930 fs/btrfs/volumes.c:1172
Code: e8 00 b8 4c fe 85 db 0f 85 65 f9 ff ff e8 93 bb 4c fe 0f 0b e9 59 f9 ff ff e8 
87 bb 4c fe 0f 0b e9 c0 fe ff ff e8 7b bb 4c fe <0f> 0b e9 f9 fe ff ff 48 c7 c7 
fc a1 8f 8b e8 e8 0b 8e fe e9 19 f9
RSP: 0018:c900061b7758 EFLAGS: 00010293
RAX:  RBX:  RCX: 83285c2c
RDX: 8880a6bbe4c0 RSI: 83285d35 RDI: 0007
RBP: dc00 R08:  R09: 8880a2be1133
R10:  R11:  R12: 8880a2be1130
R13: 8880a2be11ec R14: 888093ab0508 R15: 8880a2be1050
FS:  0208a880() GS:8880ae50() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 004babec CR3: a7bc7000 CR4: 001506e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
  btrfs_close_devices+0x8e/0x4b0 fs/btrfs/volumes.c:1184
  open_ctree+0x492a/0x49cf fs/btrfs/disk-io.c:3381
  btrfs_fill_super fs/btrfs/super.c:1316 [inline]
  btrfs_mount_root.cold+0x14/0x165 fs/btrfs/super.c:1672
  legacy_get_tree+0x105/0x220 fs/fs_context.c:592
  vfs_get_tree+0x89/0x2f0 fs/super.c:1547
  fc_mount fs/namespace.c:983 [inline]
  vfs_kern_mount.part.0+0xd3/0x170 fs/namespace.c:1013
  vfs_kern_mount+0x3c/0x60 fs/namespace.c:1000
  btrfs_mount+0x234/0xaa0 fs/btrfs/super.c:1732
  legacy_get_tree+0x105/0x220 fs/fs_context.c:592
  vfs_get_tree+0x89/0x2f0 fs/super.c:1547
  do_new_mount fs/namespace.c:2896 [inline]
  path_mount+0x12ae/0x1e70 fs/namespace.c:3216
  do_mount fs/namespace.c:3229 [inline]
  __do_sys_mount fs/namespace.c:3437 [inline]
  __se_sys_mount fs/namespace.c:3414 [inline]
  __x64_sys_mount+0x27f/0x300 fs/namespace.c:3414
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x44851a
Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 cd a2 fb ff c3 66 2e 0f 1f 84 00 
00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 aa a2 
fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7ffcb26bce08 EFLAGS: 0293 ORIG_RAX: 00a5
RAX: ffda RBX: 0004 RCX: 0044851a
RDX: 2000 RSI: 2100 RDI: 7ffcb26bce50
RBP: 7ffcb26bce90 R08: 7ffcb26bce90 R09: 2000
R10:  R11: 0293 R12: 0003
R13: 7ffcb26bce50 R14: 0003 R15: 0001




I am able to reproduce. And its quite strange at the moment. A devid 0 
is being scanned. Looks like crafted image.


[19592.946397] BTRFS: device fsid f90cac8b-044b-4fa8-8bee-4b8d3da88dc2 
devid 0 transid 0 /dev/loop0 scanned by t (70902)





Re: [PATCH -next] btrfs: Make btrfs_sysfs_add_fs_devices static

2020-09-16 Thread Anand Jain





On 16/9/20 10:26 pm, YueHaibing wrote:

Fix sparse warning:

fs/btrfs/sysfs.c:1386:5: warning:
  symbol 'btrfs_sysfs_add_fs_devices' was not declared. Should it be static?



 misc-next branch has it declared static. It was fixed later.

Thanks, Anand




Signed-off-by: YueHaibing 
---
  fs/btrfs/sysfs.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index e7b0e10685d9..279d9262b676 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1383,7 +1383,7 @@ int btrfs_sysfs_add_device(struct btrfs_device *device)
return ret;
  }
  
-int btrfs_sysfs_add_fs_devices(struct btrfs_fs_devices *fs_devices)

+static int btrfs_sysfs_add_fs_devices(struct btrfs_fs_devices *fs_devices)
  {
int ret;
struct btrfs_device *device;



Re: [PATCH -next] btrfs: Remove unused function calc_global_rsv_need_space()

2020-09-09 Thread Anand Jain

On 9/9/20 9:51 pm, YueHaibing wrote:

It is not used since commit 0096420adb03 ("btrfs: do not
account global reserve in can_overcommit")

Signed-off-by: YueHaibing 


Reviewed-by: Anand Jain 

Thanks, Anand



Re: [PATCH -next] btrfs: remove set but not used variable 'offset'

2019-07-02 Thread Anand Jain

On 2/7/19 10:15 PM, YueHaibing wrote:

Fixes gcc '-Wunused-but-set-variable' warning:

fs/btrfs/volumes.c: In function __btrfs_map_block:
fs/btrfs/volumes.c:6023:6: warning:
  variable offset set but not used [-Wunused-but-set-variable]

It is not used any more since commit 343abd1c0ca9 ("btrfs: Use
btrfs_get_io_geometry appropriately")

Reported-by: Hulk Robot 
Signed-off-by: YueHaibing 



Reviewed-by: Anand Jain 



---
  fs/btrfs/volumes.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d1fd910..5d5a9ff 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6020,7 +6020,6 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info,
  {
struct extent_map *em;
struct map_lookup *map;
-   u64 offset;
u64 stripe_offset;
u64 stripe_nr;
u64 stripe_len;
@@ -6055,7 +6054,6 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info,
map = em->map_lookup;
  
  	*length = geom.len;

-   offset = geom.offset;
stripe_len = geom.stripe_len;
stripe_nr = geom.stripe_nr;
stripe_offset = geom.stripe_offset;





Re: [PATCH 3.16 315/410] btrfs: use proper endianness accessors for super_copy

2018-06-07 Thread Anand Jain



Hi Ben,

 This patch was reverted from the mainline and 4.14 stable as it only
 added half of the endianness conversion and caused regression. This
 patch should not be included.

Thanks, Anand


On 06/07/2018 10:05 PM, Ben Hutchings wrote:

3.16.57-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Anand Jain 

commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value.  This was missing in update_super_roots and in sysfs readers.

Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.

Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.

Fixes: df93589a17378 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain 
Reviewed-by: Liu Bo 
Reviewed-by: David Sterba 
[ update changelog ]
Signed-off-by: David Sterba 
[bwh: Backported to 3.16:
  - btrfs_fs_info doesn't have cached nodesize or sectorsize fields, so use
the accessor functions
  - Adjust context]
Signed-off-by: Ben Hutchings 
---
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -406,7 +406,7 @@ static ssize_t btrfs_nodesize_show(struc
  {
struct btrfs_fs_info *fs_info = to_fs_info(kobj);
  
-	return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->super_copy->nodesize);

+   return snprintf(buf, PAGE_SIZE, "%u\n", 
btrfs_super_nodesize(fs_info->super_copy));
  }
  
  BTRFS_ATTR_RW(nodesize, 0444, btrfs_nodesize_show, btrfs_no_store);

@@ -416,7 +416,7 @@ static ssize_t btrfs_sectorsize_show(str
  {
struct btrfs_fs_info *fs_info = to_fs_info(kobj);
  
-	return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->super_copy->sectorsize);

+   return snprintf(buf, PAGE_SIZE, "%u\n", 
btrfs_super_sectorsize(fs_info->super_copy));
  }
  
  BTRFS_ATTR_RW(sectorsize, 0444, btrfs_sectorsize_show, btrfs_no_store);

@@ -426,7 +426,7 @@ static ssize_t btrfs_clone_alignment_sho
  {
struct btrfs_fs_info *fs_info = to_fs_info(kobj);
  
-	return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->super_copy->sectorsize);

+   return snprintf(buf, PAGE_SIZE, "%u\n", 
btrfs_super_sectorsize(fs_info->super_copy));
  }
  
  BTRFS_ATTR_RW(clone_alignment, 0444, btrfs_clone_alignment_show, btrfs_no_store);

--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1428,19 +1428,23 @@ static void update_super_roots(struct bt
  
  	super = root->fs_info->super_copy;
  
+	/* update latest btrfs_super_block::chunk_root refs */

root_item = >fs_info->chunk_root->root_item;
-   super->chunk_root = root_item->bytenr;
-   super->chunk_root_generation = root_item->generation;
-   super->chunk_root_level = root_item->level;
+   btrfs_set_super_chunk_root(super, root_item->bytenr);
+   btrfs_set_super_chunk_root_generation(super, root_item->generation);
+   btrfs_set_super_chunk_root_level(super, root_item->level);
  
+	/* update latest btrfs_super_block::root refs */

root_item = >fs_info->tree_root->root_item;
-   super->root = root_item->bytenr;
-   super->generation = root_item->generation;
-   super->root_level = root_item->level;
+   btrfs_set_super_root(super, root_item->bytenr);
+   btrfs_set_super_generation(super, root_item->generation);
+   btrfs_set_super_root_level(super, root_item->level);
+
if (btrfs_test_opt(root, SPACE_CACHE))
-   super->cache_generation = root_item->generation;
+   btrfs_set_super_cache_generation(super, root_item->generation);
if (root->fs_info->update_uuid_tree_gen)
-   super->uuid_tree_generation = root_item->generation;
+   btrfs_set_super_uuid_tree_generation(super,
+root_item->generation);
  }
  
  int btrfs_transaction_in_commit(struct btrfs_fs_info *info)




Re: [PATCH 3.16 315/410] btrfs: use proper endianness accessors for super_copy

2018-06-07 Thread Anand Jain



Hi Ben,

 This patch was reverted from the mainline and 4.14 stable as it only
 added half of the endianness conversion and caused regression. This
 patch should not be included.

Thanks, Anand


On 06/07/2018 10:05 PM, Ben Hutchings wrote:

3.16.57-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Anand Jain 

commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value.  This was missing in update_super_roots and in sysfs readers.

Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.

Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.

Fixes: df93589a17378 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain 
Reviewed-by: Liu Bo 
Reviewed-by: David Sterba 
[ update changelog ]
Signed-off-by: David Sterba 
[bwh: Backported to 3.16:
  - btrfs_fs_info doesn't have cached nodesize or sectorsize fields, so use
the accessor functions
  - Adjust context]
Signed-off-by: Ben Hutchings 
---
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -406,7 +406,7 @@ static ssize_t btrfs_nodesize_show(struc
  {
struct btrfs_fs_info *fs_info = to_fs_info(kobj);
  
-	return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->super_copy->nodesize);

+   return snprintf(buf, PAGE_SIZE, "%u\n", 
btrfs_super_nodesize(fs_info->super_copy));
  }
  
  BTRFS_ATTR_RW(nodesize, 0444, btrfs_nodesize_show, btrfs_no_store);

@@ -416,7 +416,7 @@ static ssize_t btrfs_sectorsize_show(str
  {
struct btrfs_fs_info *fs_info = to_fs_info(kobj);
  
-	return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->super_copy->sectorsize);

+   return snprintf(buf, PAGE_SIZE, "%u\n", 
btrfs_super_sectorsize(fs_info->super_copy));
  }
  
  BTRFS_ATTR_RW(sectorsize, 0444, btrfs_sectorsize_show, btrfs_no_store);

@@ -426,7 +426,7 @@ static ssize_t btrfs_clone_alignment_sho
  {
struct btrfs_fs_info *fs_info = to_fs_info(kobj);
  
-	return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->super_copy->sectorsize);

+   return snprintf(buf, PAGE_SIZE, "%u\n", 
btrfs_super_sectorsize(fs_info->super_copy));
  }
  
  BTRFS_ATTR_RW(clone_alignment, 0444, btrfs_clone_alignment_show, btrfs_no_store);

--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1428,19 +1428,23 @@ static void update_super_roots(struct bt
  
  	super = root->fs_info->super_copy;
  
+	/* update latest btrfs_super_block::chunk_root refs */

root_item = >fs_info->chunk_root->root_item;
-   super->chunk_root = root_item->bytenr;
-   super->chunk_root_generation = root_item->generation;
-   super->chunk_root_level = root_item->level;
+   btrfs_set_super_chunk_root(super, root_item->bytenr);
+   btrfs_set_super_chunk_root_generation(super, root_item->generation);
+   btrfs_set_super_chunk_root_level(super, root_item->level);
  
+	/* update latest btrfs_super_block::root refs */

root_item = >fs_info->tree_root->root_item;
-   super->root = root_item->bytenr;
-   super->generation = root_item->generation;
-   super->root_level = root_item->level;
+   btrfs_set_super_root(super, root_item->bytenr);
+   btrfs_set_super_generation(super, root_item->generation);
+   btrfs_set_super_root_level(super, root_item->level);
+
if (btrfs_test_opt(root, SPACE_CACHE))
-   super->cache_generation = root_item->generation;
+   btrfs_set_super_cache_generation(super, root_item->generation);
if (root->fs_info->update_uuid_tree_gen)
-   super->uuid_tree_generation = root_item->generation;
+   btrfs_set_super_uuid_tree_generation(super,
+root_item->generation);
  }
  
  int btrfs_transaction_in_commit(struct btrfs_fs_info *info)




Re: [btrfs_put_block_group] WARNING: CPU: 1 PID: 14674 at fs/btrfs/disk-io.c:3675 free_fs_root+0xc2/0xd0 [btrfs]

2018-05-03 Thread Anand Jain



On 04/19/2018 03:25 PM, Nikolay Borisov wrote:



On 19.04.2018 08:32, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel and at least dates back to v4.16 .

It's rather rare error and happens when running xfstests.


Yeah, so this is something which only recently was characterised as
leaking delalloc inodes. I can easily reproduce this when running
generic/019 test. A fix is in the works.


Hi Nikolay
 Is the fix available? generic/475 also notice the same thing.

-Anand




[  438.327552] BTRFS: error (device dm-0) in __btrfs_free_extent:6962: errno=-5 
IO failure
[  438.336415] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:3070: 
errno=-5 IO failure
[  438.345590] BTRFS error (device dm-0): pending csums is 1028096
[  438.369254] BTRFS error (device dm-0): cleaner transaction attach returned 
-30
[  438.377674] BTRFS info (device dm-0): at unmount delalloc count 98304
[  438.385166] WARNING: CPU: 1 PID: 14674 at fs/btrfs/disk-io.c:3675 
free_fs_root+0xc2/0xd0 [btrfs]
[  438.396562] Modules linked in: dm_snapshot dm_thin_pool dm_persistent_data 
dm_bio_prison dm_bufio dm_flakey dm_mod netconsole btrfs xor zstd_decompress 
zstd_compress xxhash raid6_pq sd_mod sg snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_codec_generic ata_generic pata_acpi intel_rapl 
x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel kvm_intel 
snd_hda_codec kvm irqbypass crct10dif_pclmul eeepc_wmi crc32_pclmul 
crc32c_intel ghash_clmulni_intel pata_via asus_wmi sparse_keymap snd_hda_core 
ata_piix snd_hwdep ppdev rfkill wmi_bmof i915 pcbc snd_pcm snd_timer 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops snd aesni_intel 
parport_pc crypto_simd pcspkr libata soundcore cryptd glue_helper drm wmi 
parport video shpchp ip_tables
[  438.467607] CPU: 1 PID: 14674 Comm: umount Not tainted 4.17.0-rc1 #1
[  438.474798] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 1002 04/01/2011
[  438.484804] RIP: 0010:free_fs_root+0xc2/0xd0 [btrfs]
[  438.490590] RSP: 0018:c90008b0fda8 EFLAGS: 00010282
[  438.496641] RAX: 88017c5954b0 RBX: 880137f6d800 RCX: 00018013
[  438.504652] RDX: 0001 RSI: ea0006e93600 RDI: 
[  438.512679] RBP: 88017b36 R08: 8801ba4dd000 R09: 00018013
[  438.520644] R10: c90008b0fc70 R11:  R12: c90008b0fdd0
[  438.528657] R13: 88017b360080 R14: c90008b0fdc8 R15: 
[  438.536662] FS:  7f06c1a80fc0() GS:8801bfa8() 
knlGS:
[  438.545582] CS:  0010 DS:  ES:  CR0: 80050033
[  438.552157] CR2: 7f06c12b0260 CR3: 00017d3de004 CR4: 000606e0
[  438.642653] RAX:  RBX: 0234a2d0 RCX: 7f06c1359cf7
[  438.793559] CPU: 1 PID: 14674 Comm: umount Tainted: GW 
4.17.0-rc1 #1
[  438.802152] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 1002 04/01/2011
[  438.812108] RIP: 0010:btrfs_put_block_group+0x41/0x60 [btrfs]
[  438.819364] RSP: 0018:c90008b0fde0 EFLAGS: 00010206
[  438.825378] RAX:  RBX: 8801abf63000 RCX: e38e38e38e38e38f
[  438.833307] RDX: 0001 RSI: 09f6 RDI: 8801abf63000
[  438.841230] RBP: 88017b36 R08: 88017d3b7750 R09: 000180380010
[  438.849133] R10: c90008b0fca0 R11:  R12: 8801abf63000
[  438.857047] R13: 88017b3600a0 R14: 8801abf630e0 R15: dead0100
[  438.864943] FS:  7f06c1a80fc0() GS:8801bfa8() 
knlGS:
[  438.873793] CS:  0010 DS:  ES:  CR0: 80050033
[  438.880320] CR2: 7f06c12b0260 CR3: 00017d3de004 CR4: 000606e0
[  438.888226] Call Trace:
[  438.891454]  btrfs_free_block_groups+0x138/0x3d0 [btrfs]
[  438.897569]  close_ctree+0x13b/0x2f0 [btrfs]
[  438.902618]  generic_shutdown_super+0x6c/0x120:
__read_once_size at 
include/linux/compiler.h:188
 (inlined by) list_empty at 
include/linux/list.h:203
 (inlined by) 
generic_shutdown_super at fs/super.c:442
[  438.907801]  kill_anon_super+0xe/0x20:
kill_anon_super at 
fs/super.c:1038
[  438.912223]  btrfs_kill_super+0x13/0x100 [btrfs]
[  438.917598]  deactivate_locked_super+0x3f/0x70:
deactivate_locked_super at 
fs/super.c:320
[  438.922757]  cleanup_mnt+0x3b/0x70:
cleanup_mnt at 
fs/namespace.c:1174
[  438.926879]  task_work_run+0xa3/0xe0:
task_work_run at 
kernel/task_work.c:115 (discriminator 1)
[  438.931205]  exit_to_usermode_loop+0x9e/0xa0:
tracehook_notify_resume at 
include/linux/tracehook.h:191
  

Re: [btrfs_put_block_group] WARNING: CPU: 1 PID: 14674 at fs/btrfs/disk-io.c:3675 free_fs_root+0xc2/0xd0 [btrfs]

2018-05-03 Thread Anand Jain



On 04/19/2018 03:25 PM, Nikolay Borisov wrote:



On 19.04.2018 08:32, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel and at least dates back to v4.16 .

It's rather rare error and happens when running xfstests.


Yeah, so this is something which only recently was characterised as
leaking delalloc inodes. I can easily reproduce this when running
generic/019 test. A fix is in the works.


Hi Nikolay
 Is the fix available? generic/475 also notice the same thing.

-Anand




[  438.327552] BTRFS: error (device dm-0) in __btrfs_free_extent:6962: errno=-5 
IO failure
[  438.336415] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:3070: 
errno=-5 IO failure
[  438.345590] BTRFS error (device dm-0): pending csums is 1028096
[  438.369254] BTRFS error (device dm-0): cleaner transaction attach returned 
-30
[  438.377674] BTRFS info (device dm-0): at unmount delalloc count 98304
[  438.385166] WARNING: CPU: 1 PID: 14674 at fs/btrfs/disk-io.c:3675 
free_fs_root+0xc2/0xd0 [btrfs]
[  438.396562] Modules linked in: dm_snapshot dm_thin_pool dm_persistent_data 
dm_bio_prison dm_bufio dm_flakey dm_mod netconsole btrfs xor zstd_decompress 
zstd_compress xxhash raid6_pq sd_mod sg snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_codec_generic ata_generic pata_acpi intel_rapl 
x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel kvm_intel 
snd_hda_codec kvm irqbypass crct10dif_pclmul eeepc_wmi crc32_pclmul 
crc32c_intel ghash_clmulni_intel pata_via asus_wmi sparse_keymap snd_hda_core 
ata_piix snd_hwdep ppdev rfkill wmi_bmof i915 pcbc snd_pcm snd_timer 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops snd aesni_intel 
parport_pc crypto_simd pcspkr libata soundcore cryptd glue_helper drm wmi 
parport video shpchp ip_tables
[  438.467607] CPU: 1 PID: 14674 Comm: umount Not tainted 4.17.0-rc1 #1
[  438.474798] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 1002 04/01/2011
[  438.484804] RIP: 0010:free_fs_root+0xc2/0xd0 [btrfs]
[  438.490590] RSP: 0018:c90008b0fda8 EFLAGS: 00010282
[  438.496641] RAX: 88017c5954b0 RBX: 880137f6d800 RCX: 00018013
[  438.504652] RDX: 0001 RSI: ea0006e93600 RDI: 
[  438.512679] RBP: 88017b36 R08: 8801ba4dd000 R09: 00018013
[  438.520644] R10: c90008b0fc70 R11:  R12: c90008b0fdd0
[  438.528657] R13: 88017b360080 R14: c90008b0fdc8 R15: 
[  438.536662] FS:  7f06c1a80fc0() GS:8801bfa8() 
knlGS:
[  438.545582] CS:  0010 DS:  ES:  CR0: 80050033
[  438.552157] CR2: 7f06c12b0260 CR3: 00017d3de004 CR4: 000606e0
[  438.642653] RAX:  RBX: 0234a2d0 RCX: 7f06c1359cf7
[  438.793559] CPU: 1 PID: 14674 Comm: umount Tainted: GW 
4.17.0-rc1 #1
[  438.802152] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 1002 04/01/2011
[  438.812108] RIP: 0010:btrfs_put_block_group+0x41/0x60 [btrfs]
[  438.819364] RSP: 0018:c90008b0fde0 EFLAGS: 00010206
[  438.825378] RAX:  RBX: 8801abf63000 RCX: e38e38e38e38e38f
[  438.833307] RDX: 0001 RSI: 09f6 RDI: 8801abf63000
[  438.841230] RBP: 88017b36 R08: 88017d3b7750 R09: 000180380010
[  438.849133] R10: c90008b0fca0 R11:  R12: 8801abf63000
[  438.857047] R13: 88017b3600a0 R14: 8801abf630e0 R15: dead0100
[  438.864943] FS:  7f06c1a80fc0() GS:8801bfa8() 
knlGS:
[  438.873793] CS:  0010 DS:  ES:  CR0: 80050033
[  438.880320] CR2: 7f06c12b0260 CR3: 00017d3de004 CR4: 000606e0
[  438.888226] Call Trace:
[  438.891454]  btrfs_free_block_groups+0x138/0x3d0 [btrfs]
[  438.897569]  close_ctree+0x13b/0x2f0 [btrfs]
[  438.902618]  generic_shutdown_super+0x6c/0x120:
__read_once_size at 
include/linux/compiler.h:188
 (inlined by) list_empty at 
include/linux/list.h:203
 (inlined by) 
generic_shutdown_super at fs/super.c:442
[  438.907801]  kill_anon_super+0xe/0x20:
kill_anon_super at 
fs/super.c:1038
[  438.912223]  btrfs_kill_super+0x13/0x100 [btrfs]
[  438.917598]  deactivate_locked_super+0x3f/0x70:
deactivate_locked_super at 
fs/super.c:320
[  438.922757]  cleanup_mnt+0x3b/0x70:
cleanup_mnt at 
fs/namespace.c:1174
[  438.926879]  task_work_run+0xa3/0xe0:
task_work_run at 
kernel/task_work.c:115 (discriminator 1)
[  438.931205]  exit_to_usermode_loop+0x9e/0xa0:
tracehook_notify_resume at 
include/linux/tracehook.h:191
  

Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-20 Thread Anand Jain



On 03/20/2018 03:32 AM, David Sterba wrote:

On Sat, Mar 17, 2018 at 06:27:15PM +0100, Christoph Biedl wrote:

Greg Kroah-Hartman wrote...


On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:



commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.



On big-endian systems, this change intruduces severe corruption,
resulting in complete loss of the data on the used block device.



That sucks.  Can you test Linus's tree to verify the problem is there?
I'll gladly revert this if Linus's tree also gets the revert, I don't
want you to hit this when you upgrade to a newer kernel.


Confirmed: The problem is, err ... was in Linus' tree as well. The
rather recent commit 8f5fd927c3a7 reverted the change, after that
everything is as expected again.


Thanks for checking.


Looking at the original commit, I don't have a clue why things go wrong
so horribly


It's a half endianness conversion. The plain in-memory structures are in
LE and has to be accessed via the helpers, super block copy and the
root item.  The commit adds only one half of the conversion, that
naturally does not exhibit on LE, because the macros are no-op.

Originally, the items were stored from the on-disk type to on-disk type,
regardless of the CPU:

   super->chunk_root = root_item->bytenr;

The patch should have added conversion of both values, like

   btrfs_set_super_chunk_root(super, btrfs_root_bytenr(root_item));

and not

   btrfs_set_super_chunk_root(super, root_item->bytenr);

It's not very obvious from the context though, typically there's one
structure that needs the accessors and is set from a value that's in CPU
byteorder. I think that the exception to this pattern confused all
involved developers.

The root_item members are annotated as __le64 that should be caught by
sparse/smatch checker in the buggy case, but we don't run the checkers
every the time.


 Ah! the RC is at the other side of the equation.
 Makes sense to me. Thanks.


- otherwise don't be afraid of my data. I took this as a
chance to verify my data recovery procedure, with success.


Should that not be the case, the damaged items in superblock can be
byteswapped back. That's 6 x u64 at most, I have a tool for that now.

Thanks again for the report and sorry for the trouble.


 It's entirely my mistake. My apologies for the inconvenience.

Thanks, Anand



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-20 Thread Anand Jain



On 03/20/2018 03:32 AM, David Sterba wrote:

On Sat, Mar 17, 2018 at 06:27:15PM +0100, Christoph Biedl wrote:

Greg Kroah-Hartman wrote...


On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:



commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.



On big-endian systems, this change intruduces severe corruption,
resulting in complete loss of the data on the used block device.



That sucks.  Can you test Linus's tree to verify the problem is there?
I'll gladly revert this if Linus's tree also gets the revert, I don't
want you to hit this when you upgrade to a newer kernel.


Confirmed: The problem is, err ... was in Linus' tree as well. The
rather recent commit 8f5fd927c3a7 reverted the change, after that
everything is as expected again.


Thanks for checking.


Looking at the original commit, I don't have a clue why things go wrong
so horribly


It's a half endianness conversion. The plain in-memory structures are in
LE and has to be accessed via the helpers, super block copy and the
root item.  The commit adds only one half of the conversion, that
naturally does not exhibit on LE, because the macros are no-op.

Originally, the items were stored from the on-disk type to on-disk type,
regardless of the CPU:

   super->chunk_root = root_item->bytenr;

The patch should have added conversion of both values, like

   btrfs_set_super_chunk_root(super, btrfs_root_bytenr(root_item));

and not

   btrfs_set_super_chunk_root(super, root_item->bytenr);

It's not very obvious from the context though, typically there's one
structure that needs the accessors and is set from a value that's in CPU
byteorder. I think that the exception to this pattern confused all
involved developers.

The root_item members are annotated as __le64 that should be caught by
sparse/smatch checker in the buggy case, but we don't run the checkers
every the time.


 Ah! the RC is at the other side of the equation.
 Makes sense to me. Thanks.


- otherwise don't be afraid of my data. I took this as a
chance to verify my data recovery procedure, with success.


Should that not be the case, the damaged items in superblock can be
byteswapped back. That's 6 x u64 at most, I have a tool for that now.

Thanks again for the report and sorry for the trouble.


 It's entirely my mistake. My apologies for the inconvenience.

Thanks, Anand



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-16 Thread Anand Jain



On 03/16/2018 10:02 PM, Greg Kroah-Hartman wrote:

On Fri, Mar 16, 2018 at 02:22:02PM +0100, David Sterba wrote:

On Fri, Mar 16, 2018 at 01:30:49PM +0100, Greg Kroah-Hartman wrote:

On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:

Greg Kroah-Hartman wrote...


4.14-stable review patch.  If anyone has any objections, please let me know.



commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

(...)


If the filesystem is always used on a same endian host, this will not
be a problem.


>From my observations I cannot quite subscribe to that.

On big-endian systems, this change intruduces severe corruption,
resulting in complete loss of the data on the used block device.

Steps to reproduce (tested on ppc/powerpc and parisc/hppa):

# mkfs.btrfs $DEV
# mount $DEV /mnt/tmp/
# umount /mnt/tmp/

This simple umount corrupts the file system:

# mount $DEV /mnt/tmp/
mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing 
codepage or helper program, or other error.

# dmesg:
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 18102363734671360 length 
16384
BTRFS error (device ): failed to read chunk root
BTRFS error (device ): open_ctree failed

Also fsck is of no help:

# btrfsck $DEV
Couldn't map the block 18102363734671360
No mapping for 18102363734671360-18102363734687744
Couldn't map the block 18102363734671360
bytenr mismatch, want=18102363734671360, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system


Trying mount or fsck on a little-endian system does not help either. So
I consider the data on that device lost - luckily I use btrfs only for
files where a backup exists all the time.


Reverting that change restored the previous error-free behaviour. I
didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
that affected these files. Still I could give this a try if anybody
wishes so.


That sucks.  Can you test Linus's tree to verify the problem is there?
I'll gladly revert this if Linus's tree also gets the revert, I don't
want you to hit this when you upgrade to a newer kernel.


I'll push a fix for the upcoming rc but I think it would be better to
remove the broken patch from stable kernels ASAP, so I'd recommend to
revert it now.


Now reverted, thanks.


 Thanks ! Sorry for the mess.

-Anand


greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-16 Thread Anand Jain



On 03/16/2018 10:02 PM, Greg Kroah-Hartman wrote:

On Fri, Mar 16, 2018 at 02:22:02PM +0100, David Sterba wrote:

On Fri, Mar 16, 2018 at 01:30:49PM +0100, Greg Kroah-Hartman wrote:

On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:

Greg Kroah-Hartman wrote...


4.14-stable review patch.  If anyone has any objections, please let me know.



commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

(...)


If the filesystem is always used on a same endian host, this will not
be a problem.


>From my observations I cannot quite subscribe to that.

On big-endian systems, this change intruduces severe corruption,
resulting in complete loss of the data on the used block device.

Steps to reproduce (tested on ppc/powerpc and parisc/hppa):

# mkfs.btrfs $DEV
# mount $DEV /mnt/tmp/
# umount /mnt/tmp/

This simple umount corrupts the file system:

# mount $DEV /mnt/tmp/
mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing 
codepage or helper program, or other error.

# dmesg:
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 18102363734671360 length 
16384
BTRFS error (device ): failed to read chunk root
BTRFS error (device ): open_ctree failed

Also fsck is of no help:

# btrfsck $DEV
Couldn't map the block 18102363734671360
No mapping for 18102363734671360-18102363734687744
Couldn't map the block 18102363734671360
bytenr mismatch, want=18102363734671360, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system


Trying mount or fsck on a little-endian system does not help either. So
I consider the data on that device lost - luckily I use btrfs only for
files where a backup exists all the time.


Reverting that change restored the previous error-free behaviour. I
didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
that affected these files. Still I could give this a try if anybody
wishes so.


That sucks.  Can you test Linus's tree to verify the problem is there?
I'll gladly revert this if Linus's tree also gets the revert, I don't
want you to hit this when you upgrade to a newer kernel.


I'll push a fix for the upcoming rc but I think it would be better to
remove the broken patch from stable kernels ASAP, so I'd recommend to
revert it now.


Now reverted, thanks.


 Thanks ! Sorry for the mess.

-Anand


greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-16 Thread Anand Jain



On 03/16/2018 02:55 AM, Christoph Biedl wrote:

Greg Kroah-Hartman wrote...


4.14-stable review patch.  If anyone has any objections, please let me know.



commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

(...)


If the filesystem is always used on a same endian host, this will not
be a problem.


 From my observations I cannot quite subscribe to that.

On big-endian systems, this change intruduces severe corruption,
resulting in complete loss of the data on the used block device.


Thanks for the report.

That's really bad, my mistake. I am digging to know how it happened. 
Our on-disk root bytenr are little-endian compatible.  So using the 
cpu_to_le for write on a big-endian arch is a correct thing to do. If it 
fails,  certainly there is something which I have overlooked. I am 
digging to know. Thanks for the report again.


Fsck won't be able to figure out the correct on-disk btyenr either.

If there isn't any backup we could try to find out the correct pointers 
manually.  However, restore from the backup approach is much better.


-Anand



Steps to reproduce (tested on ppc/powerpc and parisc/hppa):

# mkfs.btrfs $DEV
# mount $DEV /mnt/tmp/
# umount /mnt/tmp/

This simple umount corrupts the file system:

# mount $DEV /mnt/tmp/
mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing 
codepage or helper program, or other error.

# dmesg:
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 18102363734671360 length 
16384
BTRFS error (device ): failed to read chunk root
BTRFS error (device ): open_ctree failed

Also fsck is of no help:

# btrfsck $DEV
Couldn't map the block 18102363734671360
No mapping for 18102363734671360-18102363734687744
Couldn't map the block 18102363734671360
bytenr mismatch, want=18102363734671360, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system


Trying mount or fsck on a little-endian system does not help either. So
I consider the data on that device lost - luckily I use btrfs only for
files where a backup exists all the time.


Reverting that change restored the previous error-free behaviour. I
didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
that affected these files. Still I could give this a try if anybody
wishes so.

Cheers,

 Christoph



Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-16 Thread Anand Jain



On 03/16/2018 02:55 AM, Christoph Biedl wrote:

Greg Kroah-Hartman wrote...


4.14-stable review patch.  If anyone has any objections, please let me know.



commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

(...)


If the filesystem is always used on a same endian host, this will not
be a problem.


 From my observations I cannot quite subscribe to that.

On big-endian systems, this change intruduces severe corruption,
resulting in complete loss of the data on the used block device.


Thanks for the report.

That's really bad, my mistake. I am digging to know how it happened. 
Our on-disk root bytenr are little-endian compatible.  So using the 
cpu_to_le for write on a big-endian arch is a correct thing to do. If it 
fails,  certainly there is something which I have overlooked. I am 
digging to know. Thanks for the report again.


Fsck won't be able to figure out the correct on-disk btyenr either.

If there isn't any backup we could try to find out the correct pointers 
manually.  However, restore from the backup approach is much better.


-Anand



Steps to reproduce (tested on ppc/powerpc and parisc/hppa):

# mkfs.btrfs $DEV
# mount $DEV /mnt/tmp/
# umount /mnt/tmp/

This simple umount corrupts the file system:

# mount $DEV /mnt/tmp/
mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing 
codepage or helper program, or other error.

# dmesg:
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 18102363734671360 length 
16384
BTRFS error (device ): failed to read chunk root
BTRFS error (device ): open_ctree failed

Also fsck is of no help:

# btrfsck $DEV
Couldn't map the block 18102363734671360
No mapping for 18102363734671360-18102363734687744
Couldn't map the block 18102363734671360
bytenr mismatch, want=18102363734671360, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system


Trying mount or fsck on a little-endian system does not help either. So
I consider the data on that device lost - luckily I use btrfs only for
files where a backup exists all the time.


Reverting that change restored the previous error-free behaviour. I
didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
that affected these files. Still I could give this a try if anybody
wishes so.

Cheers,

 Christoph



Re: [PATCH] f2fs: support multiple devices

2016-11-10 Thread Anand Jain



(this is deviating from the subject, sorry about that)


 Pretty much, if you're using just raid1 mode, without
compression, on reasonable storage devices, things are rock-solid
relative to the rest of BTRFS.


IMO, BTRFS volume manger feature is incomplete and there is RAID1
critical bug which affects availability, so its not suitable for 
enterprise solutions yet. However it should be fine in a setup

where dedicated sysadmin and maintenance downtime is a choice.


Re: [PATCH] f2fs: support multiple devices

2016-11-10 Thread Anand Jain



(this is deviating from the subject, sorry about that)


 Pretty much, if you're using just raid1 mode, without
compression, on reasonable storage devices, things are rock-solid
relative to the rest of BTRFS.


IMO, BTRFS volume manger feature is incomplete and there is RAID1
critical bug which affects availability, so its not suitable for 
enterprise solutions yet. However it should be fine in a setup

where dedicated sysadmin and maintenance downtime is a choice.


Re: [lkp] [btrfs] 164a91c9be: xfstests.generic.088.fail

2016-06-21 Thread Anand Jain



Thanks for the report.

> _check_btrfs_filesystem: filesystem on /dev/sda5 is inconsistent (see 
/lkp/benchmarks/xfstests/results//check.full)

> generic/088 0s

Ran tests again unfortunately I couldn't reproduce..
-
generic/088 0s ... 0s
generic/089 21s ... 39s
generic/090 4s ... 4s
generic/091 58s ... 58s
generic/092 1s ... 0s
generic/096  [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/098 2s ... 2s
generic/100 18s ... 18s
generic/101  3s
generic/102  8s
generic/103  2s
--

But a possible scenario .. in systems where rcu_callback itself
could be deferred it needs rcu_barrier(). Fixed this in V2, by
restoring rcu_barrier() and then grace period wait for devices
to close.



+mount: /dev/sda6 is already mounted or /fs/scratch busy

::

+umount: /dev/sda6: not mounted


This might be different issue, when this happens generally
wipefs all scratch pool devices helps.


Thanks, Anand





Re: [lkp] [btrfs] 164a91c9be: xfstests.generic.088.fail

2016-06-21 Thread Anand Jain



Thanks for the report.

> _check_btrfs_filesystem: filesystem on /dev/sda5 is inconsistent (see 
/lkp/benchmarks/xfstests/results//check.full)

> generic/088 0s

Ran tests again unfortunately I couldn't reproduce..
-
generic/088 0s ... 0s
generic/089 21s ... 39s
generic/090 4s ... 4s
generic/091 58s ... 58s
generic/092 1s ... 0s
generic/096  [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/098 2s ... 2s
generic/100 18s ... 18s
generic/101  3s
generic/102  8s
generic/103  2s
--

But a possible scenario .. in systems where rcu_callback itself
could be deferred it needs rcu_barrier(). Fixed this in V2, by
restoring rcu_barrier() and then grace period wait for devices
to close.



+mount: /dev/sda6 is already mounted or /fs/scratch busy

::

+umount: /dev/sda6: not mounted


This might be different issue, when this happens generally
wipefs all scratch pool devices helps.


Thanks, Anand





Re: [PATCH 1/1] Btrfs: Code Cleanup

2016-03-21 Thread Anand Jain



Hi Flex,



diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 366b335..5c16f04 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2325,7 +2325,10 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
if (seeding_dev) {
sb->s_flags &= ~MS_RDONLY;


 This is not undone in the failure code. Theoretically
 it should report error during unmount, did you notice ?

 (in general, $subject can be more specific).

Thanks, Anand


ret = btrfs_prepare_sprout(root);
-   BUG_ON(ret); /* -ENOMEM */
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto error_trans;
+   }
}

device->fs_devices = root->fs_info->fs_devices;



Re: [PATCH 1/1] Btrfs: Code Cleanup

2016-03-21 Thread Anand Jain



Hi Flex,



diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 366b335..5c16f04 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2325,7 +2325,10 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
if (seeding_dev) {
sb->s_flags &= ~MS_RDONLY;


 This is not undone in the failure code. Theoretically
 it should report error during unmount, did you notice ?

 (in general, $subject can be more specific).

Thanks, Anand


ret = btrfs_prepare_sprout(root);
-   BUG_ON(ret); /* -ENOMEM */
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto error_trans;
+   }
}

device->fs_devices = root->fs_info->fs_devices;