** Changed in: linux (Ubuntu Focal)
       Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  [Impact]
  kmemcaches will fail to be created after they have just been removed but not 
completely ripped out. This will cause some drivers (like lvm snapshots) to 
properly work and cause kernel traces to go on the logs.

  [Test case]
  See comment #9.

  [Regression potential]
  The fix reverts a commit, so we go back to a state of a previously released 
kernel, where a leak was possible. The regression here, though, is better than 
the impact that will also lead to a different leak and prevent users from 
correctly using LVM snapshots.

  =========================================================================

  One of my bionic servers with HWE 5.4.0 hangs on boot (apparently
  while starting LVM snapshots) after upgrading from Linux 5.4.0-42 to
  5.4.0-47, with the following trace:

    [   29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, 
don't try to register things with the same name in the same directory.
    [   29.138854] BUG: kernel NULL pointer dereference, address: 
0000000000000020
    [   29.145977] #PF: supervisor read access in kernel mode
    [   29.145979] #PF: error_code(0x0000) - not-present page
    [   29.145981] PGD 0 P4D 0
    [   29.158800] Oops: 0000 [#1] SMP NOPTI
    [   29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic 
#50~18.04.1-Ubuntu
    [   29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 
07/15/2019
    [   29.178038] RIP: 0010:free_percpu+0x120/0x1f0
    [   29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 
4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 
58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
    [   29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046
    [   29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: 
ffffffffa880a000
    [   29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 
0000000000000000
    [   29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: 
ffffffffa74a5300
    [   29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: 
cf35c0f24f14c3c0
    [   29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 
0000000000000008
    [   29.244878] FS:  00007f93a04b0900(0000) GS:ffff913faed80000(0000) 
knlGS:0000000000000000
    [   29.252961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 
00000000003406e0
    [   29.265883] Call Trace:
    [   29.268346]  __kmem_cache_release+0x1a/0x30
    [   29.273913]  __kmem_cache_create+0x4f9/0x550
    [   29.278192]  ? __kmalloc_node+0x1eb/0x320
    [   29.282205]  ? kvmalloc_node+0x31/0x80
    [   29.285962]  create_cache+0x120/0x1f0
    [   29.291003]  kmem_cache_create_usercopy+0x17d/0x270
    [   29.295882]  kmem_cache_create+0x16/0x20
    [   29.300152]  dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
    [   29.305644]  ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
    [   29.310693]  persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
    [   29.316627]  ? _cond_resched+0x19/0x40
    [   29.320384]  snapshot_ctr+0x79e/0x910 [dm_snapshot]
    [   29.325276]  dm_table_add_target+0x18d/0x370
    [   29.329552]  table_load+0x12a/0x370
    [   29.333045]  ctl_ioctl+0x1e2/0x590
    [   29.336450]  ? retrieve_status+0x1c0/0x1c0
    [   29.340551]  dm_ctl_ioctl+0xe/0x20
    [   29.343958]  do_vfs_ioctl+0xa9/0x640
    [   29.347547]  ? ksys_semctl.constprop.19+0xf7/0x190
    [   29.352337]  ksys_ioctl+0x75/0x80
    [   29.355663]  __x64_sys_ioctl+0x1a/0x20
    [   29.359421]  do_syscall_64+0x57/0x190
    [   29.363094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [   29.368144] RIP: 0033:0x7f939f0286d7
    [   29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 
c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
    [   29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 
0000000000000010
    [   29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 
00007f939f0286d7
    [   29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 
0000000000000009
    [   29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 
00007ffe918defd0
    [   29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 
00007f939f59c4e6
    [   29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 
00007f939f59c4e6
    [   29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero 
nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd 
kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel 
ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core 
bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast 
ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas 
nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect 
ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw 
i2c_algo_bit scsi_transport_sas drm hid i2c_piix4
    [   29.507853] CR2: 0000000000000020
    [   29.511174] ---[ end trace 43bd923f80cbdf52 ]---

  That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
  working kernel shows some trouble there:

    $ uname -a
    Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 
07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
    $ ls -l /sys/kernel/slab | grep a-0000152
    lrwxrwxrwx 1 root root 0 Sep  8 03:20 dm_bufio_buffer -> :a-0000152

  So on 5.4.0-42 the named node doesn't get created, but at least it
  doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I
  can't reproduce the crash on other machines with snapshot thin volumes
  despite it happening every time (even with maxcpus=1) on the affected
  system.

  It should be noted that LVM was not in use on this system until just
  before it was rebooted into the new kernel, but downgrading to -42
  does work so it seems like a coincidence. Before I realised it was a
  recent regression I dug through mm/slub.c's history and found dde3c6b7
  ("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious
  -- it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab
  double-free in case of duplicate sysfs filename"), exactly the
  codepath that seems to crash here.

  There's clearly some existing bug causing the slab sysfs node to not
  be added, and I guess dde3c6b7 turns that into a crash on some
  systems. This is a test system, so I can do whatever debugging is
  required to narrow down the trigger.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to