Performing verification for Bionic.

I installed 4.15.0-151-generic from -updates, and added the reproducer
btrfs qcow2 image file to my VM.

From there, I mounted the filesystem and attempted to balance:

$ sudo mount /dev/vdb /mnt
$ sudo btrfs filesystem balance start --full-balance /mnt
Segmentation fault (core dumped)

Checking dmesg, we get the same oops as I reported:

https://paste.ubuntu.com/p/wWHCgzZxTZ/

I then enabled -proposed and installed 4.15.0-152-generic, and rebooted:

$ sudo mount /dev/vdb /mnt
$ sudo btrfs filesystem balance start --full-balance /mnt
ERROR: error during balancing '/mnt': No space left on device
$ dmesg | tail -n 7
[   34.131066] BTRFS info (device vdb): disk space caching is enabled
[   34.131070] BTRFS info (device vdb): has skinny extents
[   34.149906] BTRFS info (device vdb): checking UUID tree
[   34.149946] BTRFS info (device vdb): continuing balance
[   34.227645] BTRFS info (device vdb): 2 enospc errors during balance
[   40.009032] BTRFS info (device vdb): relocating block group 27995340800 
flags data
[   40.200573] BTRFS info (device vdb): 14 enospc errors during balance

We no longer suffer a kernel oops, and instead, we correctly report that
the disk is too full and a balance cannot be completed.

After deleting some files and re-issuing balances, balancing completes
successfully.

$ sudo btrfs filesystem df /mnt
Data, single: total=4.88GiB, used=4.51GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=256.00MiB, used=5.39MiB
GlobalReserve, single: total=16.00MiB, used=0.00B
$ sudo btrfs filesystem balance start --full-balance /mnt
Done, had to relocate 8 out of 8 chunks

The kernel in -proposed fixes the problem, happy to mark Bionic as
verified.

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1933172

Title:
  btrfs: Attempting to balance a nearly full filesystem with relocated
  root nodes fails

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1933172

  [Impact]

  If you attempt to balance a btrfs filesystem that is nearly full, and
  this filesystem has had a lot of small, medium and large files created
  and deleted, such that the b-tree needs to be rotated, when the
  balance fails due to not having enough free space, the kernel oops,
  and the btrfs filesystem hangs.

  It doesn't appear to cause any filesystem corruption, and is
  reproducible every time on affected filesystems.

  The following oops is generated:

  general protection fault: 0000 [#1] SMP PTI
  CPU: 0 PID: 18440 Comm: btrfs Not tainted 4.15.0-136-generic #140-Ubuntu
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
  RIP: 0010:btrfs_set_root_node+0x5/0x60 [btrfs]
  RSP: 0018:ffffb3db890a79e0 EFLAGS: 00010282
  RAX: ffff8d7f73861ad0 RBX: ffff8d7f78455708 RCX: ffff8d7f6d9a5390
  RDX: ffff8d7f73861ad0 RSI: a023775cfc0348a3 RDI: ffff8d7f6d9a5028
  RBP: ffffb3db890a7a78 R08: 0000000000000044 R09: 0000000000000228
  R10: ffff8d7f6d9a5000 R11: 0000000000000010 R12: ffffb3db890a7a08
  R13: ffff8d7f6d9a5000 R14: ffff8d7f6d9a5028 R15: ffff8d7f74560000
  FS:  00007f48d84498c0(0000) GS:ffff8d7f7fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fe4fbc1f000 CR3: 00000001799fc001 CR4: 0000000000160ef0
  Call Trace:
   ? commit_fs_roots+0x130/0x1b0 [btrfs]
   ? btrfs_run_delayed_refs.part.70+0x80/0x190 [btrfs]
   btrfs_commit_transaction+0x42c/0x910 [btrfs]
   ? start_transaction+0x191/0x430 [btrfs]
   relocate_block_group+0x1e7/0x640 [btrfs]
   btrfs_relocate_block_group+0x18f/0x280 [btrfs]
   btrfs_relocate_chunk+0x38/0xd0 [btrfs]
   __btrfs_balance+0x972/0xcd0 [btrfs]
   ? insert_balance_item.isra.35+0x391/0x3c0 [btrfs]
   btrfs_balance+0x32c/0x5a0 [btrfs]
   btrfs_ioctl_balance+0x320/0x390 [btrfs]
   btrfs_ioctl+0x5a6/0x2490 [btrfs]
   ? lru_cache_add_active_or_unevictable+0x36/0xb0
   ? __handle_mm_fault+0x9fd/0x1290
   do_vfs_ioctl+0xa8/0x630
   ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
   ? do_vfs_ioctl+0xa8/0x630
   ? __do_page_fault+0x2a1/0x4b0
   SyS_ioctl+0x79/0x90
   do_syscall_64+0x73/0x130
   entry_SYSCALL_64_after_hwframe+0x41/0xa6
  RIP: 0033:0x7f48d7228317
  RSP: 002b:00007ffd76d03e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f48d7228317
  RDX: 00007ffd76d03ec8 RSI: 00000000c4009420 RDI: 0000000000000003
  RBP: 00007ffd76d03ec8 R08: 0000000000000078 R09: 0000000000000000
  R10: 0000562086e7f010 R11: 0000000000000246 R12: 0000000000000003
  R13: 00007ffd76d057cb R14: 0000000000000002 R15: 0000000000000000
  Code: 4d 85 e4 0f 84 56 fe ff ff 4d 89 04 24 41 c6 44 24 08 84 4d 89 4c 24 09 
e9 42 fe ff ff 0f 0b e8 02 24 5e e0 66 90 0f 1f 44 00 00 <48> 8b 06 48 8b 0d c9 
d4 99 e1 48 8b 15 d2 d4 99 e1 55 48 89 87
  RIP: btrfs_set_root_node+0x5/0x60 [btrfs] RSP: ffffb3db890a79e0

  I don't see this behaviour on any upstream kernel, and the first
  kernel to show this behaviour is 4.15.0-109-generic. The current
  4.15.0-145-generic is still affected.

  I believe that this is a regression introduced in the fixing of
  CVE-2019-19036.

  [Testcase]

  I haven't reliably been able to create a script which places a btrfs
  filesystem into the state necessary to reproduce this issue, so I have
  just provided my qcow2 image with my btrfs filesystem which reproduces
  the issue 100% of the time.

  Download the image from here (warning size is 8.0gb):

  https://people.canonical.com/~mruffell/sf311164/ubuntu18.04-server-2.qcow2

  Make a Ubuntu 18.04 VM. Attach the ubuntu18.04-server-2.qcow2 image to
  a new virtio disk. Note, ubuntu18.04-server-2.qcow2 does not have an
  operating system, it is just a data only volume.

  Mount the volume:

  $ sudo mount /dev/vdb /mnt

  Attempt to balance:

  $ sudo btrfs filesystem balance start --full-balance /mnt
  Segmentation fault (core dumped)

  Check dmesg for kernel oops:
  https://paste.ubuntu.com/p/wjJNqKBCfh/

  If you install the test kernel from the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf311164-test

  You should see this instead:

  $ sudo btrfs filesystem balance start --full-balance /mnt
  ERROR: error during balancing '/mnt': No space left on device
  There may be more info in syslog - try dmesg | tail

  Checking dmesg shows no kernel oops, and just info about the volume
  being too full to balance:

  https://paste.ubuntu.com/p/4J8Gq2dtz4/

  [Fix]

  I found the problem to be introduced in 4.15.0-109-generic, and
  4.15.0-108-generic and earlier worked fine, which means we introduced
  a regression somewhere.

  I bisected the problem down to the following commit:

  ubuntu-bionic 6f536ce7a978531d38a21d092394616cefb54436
  Author: Qu Wenruo <w...@suse.com>
  Date:   Tue May 19 10:13:20 2020 +0800
  Subject btrfs: reloc: fix reloc root leak and NULL pointer dereference
  Link: https://paste.ubuntu.com/p/4qfWCM8ykh/

  Unfortunately, I believe this is a bad backport. If you examine the
  original upstream commit:

  commit 51415b6c1b117e223bc083e30af675cb5c5498f3
  Author: Qu Wenruo <w...@suse.com>
  Date:   Tue May 19 10:13:20 2020 +0800
  Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference
  Link: 
https://github.com/torvalds/linux/commit/51415b6c1b117e223bc083e30af675cb5c5498f3

  You will see the 4.15 backport has calls to free_extent_buffer() and
  btrfs_put_fs_root(). Now, btrfs_put_fs_root() was renamed to
  btrfs_put_root() in the newer patches, and contains logic to free
  relocated roots, so I think we might not need the calls to
  free_extent_buffer() to free the extents first, since it might be
  handled later.

  The core issue is that we hit a general protection fault when
  attempting to access a root node, which means we have freed a root
  node we shouldn't have.

  If we look at the backport in 5.4.y, aka, the one in Focal:

  ubuntu-focal ecaee3a76ea998bc2fe20f056eb27f9bc837d116
  Author: Qu Wenruo <w...@suse.com>
  Date:   Tue May 19 10:13:20 2020 +0800
  Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference
  Link: https://paste.ubuntu.com/p/PZrMqVt8Yk/

  It seems upstream -stable omitted the calls to btrfs_put_root()
  entirely, and we don't need the calls to free_extent_buffer() because
  of it.

  If I revert 6f536ce7a978531d38a21d092394616cefb54436 from ubuntu-
  bionic, and cherry-pick ecaee3a76ea998bc2fe20f056eb27f9bc837d116 from
  ubuntu-focal, and build, the problem no longer reproduces.

  [Where problems could occur]

  If a regression were to occur, it would affect users of btrfs
  filesystems, and would likely show during a routine balance operation.
  Since the issue is triggered during the cancellation of a balance
  operation, problems might occur for users with nearly full filesystems
  or filesystems that have existing corruption.

  We are replacing a patch that was backported during the fixing of
  CVE-2019-19036, and replacing it with a backport provided by upstream
  developers, which cherry picks from 5.4.y to Bionic. The patch in
  5.4.y is well tested by the community and is currently in the Focal
  kernel.

  With all modifications to btrfs, there is a risk of data corruption
  and filesystem corruption for all btrfs users, since balances happen
  automatically and on a regular basis. If a regression does happen,
  users should remount their filesystems with the "nobalance" flag,
  backup their data, and attempt a repair if necessary.

  [Other info]

  A community member has hit this issue before I did, and has reported
  it upstream to linux-btrfs here, although no one knew what was
  happening:

  https://www.spinics.net/lists/linux-btrfs/msg103367.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1933172/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to