Public bug reported:

One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while
starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47,
with the following trace:

  [   29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, don't 
try to register things with the same name in the same directory.
  [   29.138854] BUG: kernel NULL pointer dereference, address: 0000000000000020
  [   29.145977] #PF: supervisor read access in kernel mode
  [   29.145979] #PF: error_code(0x0000) - not-present page
  [   29.145981] PGD 0 P4D 0
  [   29.158800] Oops: 0000 [#1] SMP NOPTI
  [   29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic 
#50~18.04.1-Ubuntu
  [   29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 
07/15/2019
  [   29.178038] RIP: 0010:free_percpu+0x120/0x1f0
  [   29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 4e 
bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 58 
20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
  [   29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046
  [   29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: 
ffffffffa880a000
  [   29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 
0000000000000000
  [   29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: 
ffffffffa74a5300
  [   29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: 
cf35c0f24f14c3c0
  [   29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 
0000000000000008
  [   29.244878] FS:  00007f93a04b0900(0000) GS:ffff913faed80000(0000) 
knlGS:0000000000000000
  [   29.252961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 
00000000003406e0
  [   29.265883] Call Trace:
  [   29.268346]  __kmem_cache_release+0x1a/0x30
  [   29.273913]  __kmem_cache_create+0x4f9/0x550
  [   29.278192]  ? __kmalloc_node+0x1eb/0x320
  [   29.282205]  ? kvmalloc_node+0x31/0x80
  [   29.285962]  create_cache+0x120/0x1f0
  [   29.291003]  kmem_cache_create_usercopy+0x17d/0x270
  [   29.295882]  kmem_cache_create+0x16/0x20
  [   29.300152]  dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
  [   29.305644]  ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
  [   29.310693]  persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
  [   29.316627]  ? _cond_resched+0x19/0x40
  [   29.320384]  snapshot_ctr+0x79e/0x910 [dm_snapshot]
  [   29.325276]  dm_table_add_target+0x18d/0x370
  [   29.329552]  table_load+0x12a/0x370
  [   29.333045]  ctl_ioctl+0x1e2/0x590
  [   29.336450]  ? retrieve_status+0x1c0/0x1c0
  [   29.340551]  dm_ctl_ioctl+0xe/0x20
  [   29.343958]  do_vfs_ioctl+0xa9/0x640
  [   29.347547]  ? ksys_semctl.constprop.19+0xf7/0x190
  [   29.352337]  ksys_ioctl+0x75/0x80
  [   29.355663]  __x64_sys_ioctl+0x1a/0x20
  [   29.359421]  do_syscall_64+0x57/0x190
  [   29.363094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [   29.368144] RIP: 0033:0x7f939f0286d7
  [   29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 
c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
  [   29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 
0000000000000010
  [   29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 
00007f939f0286d7
  [   29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 
0000000000000009
  [   29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 
00007ffe918defd0
  [   29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 
00007f939f59c4e6
  [   29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 
00007f939f59c4e6
  [   29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero nls_iso8859_1 
ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd kvm ccp k10temp 
ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel ib_iser rdma_cm iw_cm 
ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables 
x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear mlx5_ib ib_uverbs ib_core bcache crc64 hid_generic 
crct10dif_pclmul mlx5_core crc32_pclmul ast ghash_clmulni_intel drm_vram_helper 
pci_hyperv_intf ttm aesni_intel mpt3sas nvme crypto_simd drm_kms_helper 
syscopyarea igb cryptd raid_class sysfillrect ahci tls sysimgblt glue_helper 
dca usbhid fb_sys_fops libahci nvme_core mlxfw i2c_algo_bit scsi_transport_sas 
drm hid i2c_piix4
  [   29.507853] CR2: 0000000000000020
  [   29.511174] ---[ end trace 43bd923f80cbdf52 ]---

That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
working kernel shows some trouble there:

  $ uname -a
  Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  $ ls -l /sys/kernel/slab | grep a-0000152
  lrwxrwxrwx 1 root root 0 Sep  8 03:20 dm_bufio_buffer -> :a-0000152

So on 5.4.0-42 the named node doesn't get created, but at least it
doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I
can't reproduce the crash on other machines with snapshot thin volumes
despite it happening every time (even with maxcpus=1) on the affected
system.

It should be noted that LVM was not in use on this system until just
before it was rebooted into the new kernel, but downgrading to -42 does
work so it seems like a coincidence. Before I realised it was a recent
regression I dug through mm/slub.c's history and found dde3c6b7
("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious --
it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab double-free
in case of duplicate sysfs filename"), exactly the codepath that seems
to crash here.

There's clearly some existing bug causing the slab sysfs node to not be
added, and I guess dde3c6b7 turns that into a crash on some systems.
This is a test system, so I can do whatever debugging is required to
narrow down the trigger.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Description changed:

  One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while
  starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47,
  with the following trace:
  
-   [   29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, 
don't try to register things with the same name in the same directory.
-   [   29.138854] BUG: kernel NULL pointer dereference, address: 
0000000000000020
-   [   29.145977] #PF: supervisor read access in kernel mode
-   [   29.145979] #PF: error_code(0x0000) - not-present page
-   [   29.145981] PGD 0 P4D 0 
-   [   29.158800] Oops: 0000 [#1] SMP NOPTI
-   [   29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic 
#50~18.04.1-Ubuntu
-   [   29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 
07/15/2019
-   [   29.178038] RIP: 0010:free_percpu+0x120/0x1f0
-   [   29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 
4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 
58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
-   [   29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046
-   [   29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: 
ffffffffa880a000
-   [   29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 
0000000000000000
-   [   29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: 
ffffffffa74a5300
-   [   29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: 
cf35c0f24f14c3c0
-   [   29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 
0000000000000008
-   [   29.244878] FS:  00007f93a04b0900(0000) GS:ffff913faed80000(0000) 
knlGS:0000000000000000
-   [   29.252961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
-   [   29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 
00000000003406e0
-   [   29.265883] Call Trace:
-   [   29.268346]  __kmem_cache_release+0x1a/0x30
-   [   29.273913]  __kmem_cache_create+0x4f9/0x550
-   [   29.278192]  ? __kmalloc_node+0x1eb/0x320
-   [   29.282205]  ? kvmalloc_node+0x31/0x80
-   [   29.285962]  create_cache+0x120/0x1f0
-   [   29.291003]  kmem_cache_create_usercopy+0x17d/0x270
-   [   29.295882]  kmem_cache_create+0x16/0x20
-   [   29.300152]  dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
-   [   29.305644]  ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
-   [   29.310693]  persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
-   [   29.316627]  ? _cond_resched+0x19/0x40
-   [   29.320384]  snapshot_ctr+0x79e/0x910 [dm_snapshot]
-   [   29.325276]  dm_table_add_target+0x18d/0x370
-   [   29.329552]  table_load+0x12a/0x370
-   [   29.333045]  ctl_ioctl+0x1e2/0x590
-   [   29.336450]  ? retrieve_status+0x1c0/0x1c0
-   [   29.340551]  dm_ctl_ioctl+0xe/0x20
-   [   29.343958]  do_vfs_ioctl+0xa9/0x640
-   [   29.347547]  ? ksys_semctl.constprop.19+0xf7/0x190
-   [   29.352337]  ksys_ioctl+0x75/0x80
-   [   29.355663]  __x64_sys_ioctl+0x1a/0x20
-   [   29.359421]  do_syscall_64+0x57/0x190
-   [   29.363094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
-   [   29.368144] RIP: 0033:0x7f939f0286d7
-   [   29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 
c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
-   [   29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 
0000000000000010
-   [   29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 
00007f939f0286d7
-   [   29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 
0000000000000009
-   [   29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 
00007ffe918defd0
-   [   29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 
00007f939f59c4e6
-   [   29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 
00007f939f59c4e6
-   [   29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero 
nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd 
kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel 
ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core 
bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast 
ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas 
nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect 
ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw 
i2c_algo_bit scsi_transport_sas drm hid i2c_piix4
-   [   29.507853] CR2: 0000000000000020
-   [   29.511174] ---[ end trace 43bd923f80cbdf52 ]---
+   [   29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, 
don't try to register things with the same name in the same directory.
+   [   29.138854] BUG: kernel NULL pointer dereference, address: 
0000000000000020
+   [   29.145977] #PF: supervisor read access in kernel mode
+   [   29.145979] #PF: error_code(0x0000) - not-present page
+   [   29.145981] PGD 0 P4D 0
+   [   29.158800] Oops: 0000 [#1] SMP NOPTI
+   [   29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic 
#50~18.04.1-Ubuntu
+   [   29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 
07/15/2019
+   [   29.178038] RIP: 0010:free_percpu+0x120/0x1f0
+   [   29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 
4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 
58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
+   [   29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046
+   [   29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: 
ffffffffa880a000
+   [   29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 
0000000000000000
+   [   29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: 
ffffffffa74a5300
+   [   29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: 
cf35c0f24f14c3c0
+   [   29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 
0000000000000008
+   [   29.244878] FS:  00007f93a04b0900(0000) GS:ffff913faed80000(0000) 
knlGS:0000000000000000
+   [   29.252961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+   [   29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 
00000000003406e0
+   [   29.265883] Call Trace:
+   [   29.268346]  __kmem_cache_release+0x1a/0x30
+   [   29.273913]  __kmem_cache_create+0x4f9/0x550
+   [   29.278192]  ? __kmalloc_node+0x1eb/0x320
+   [   29.282205]  ? kvmalloc_node+0x31/0x80
+   [   29.285962]  create_cache+0x120/0x1f0
+   [   29.291003]  kmem_cache_create_usercopy+0x17d/0x270
+   [   29.295882]  kmem_cache_create+0x16/0x20
+   [   29.300152]  dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
+   [   29.305644]  ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
+   [   29.310693]  persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
+   [   29.316627]  ? _cond_resched+0x19/0x40
+   [   29.320384]  snapshot_ctr+0x79e/0x910 [dm_snapshot]
+   [   29.325276]  dm_table_add_target+0x18d/0x370
+   [   29.329552]  table_load+0x12a/0x370
+   [   29.333045]  ctl_ioctl+0x1e2/0x590
+   [   29.336450]  ? retrieve_status+0x1c0/0x1c0
+   [   29.340551]  dm_ctl_ioctl+0xe/0x20
+   [   29.343958]  do_vfs_ioctl+0xa9/0x640
+   [   29.347547]  ? ksys_semctl.constprop.19+0xf7/0x190
+   [   29.352337]  ksys_ioctl+0x75/0x80
+   [   29.355663]  __x64_sys_ioctl+0x1a/0x20
+   [   29.359421]  do_syscall_64+0x57/0x190
+   [   29.363094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
+   [   29.368144] RIP: 0033:0x7f939f0286d7
+   [   29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 
c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
+   [   29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 
0000000000000010
+   [   29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 
00007f939f0286d7
+   [   29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 
0000000000000009
+   [   29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 
00007ffe918defd0
+   [   29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 
00007f939f59c4e6
+   [   29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 
00007f939f59c4e6
+   [   29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero 
nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd 
kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel 
ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core 
bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast 
ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas 
nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect 
ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw 
i2c_algo_bit scsi_transport_sas drm hid i2c_piix4
+   [   29.507853] CR2: 0000000000000020
+   [   29.511174] ---[ end trace 43bd923f80cbdf52 ]---
  
- That :a-0000152 is meant to be /sys/kernel/debug/:a-0000152. Even a
+ That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
  working kernel shows some trouble there:
  
-   $ uname -a
-   Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 
07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
-   $ ls -l /sys/kernel/slab | grep a-0000152
-   lrwxrwxrwx 1 root root 0 Sep  8 03:20 dm_bufio_buffer -> :a-0000152
+   $ uname -a
+   Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 
07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
+   $ ls -l /sys/kernel/slab | grep a-0000152
+   lrwxrwxrwx 1 root root 0 Sep  8 03:20 dm_bufio_buffer -> :a-0000152
  
  So on 5.4.0-42 the named node doesn't get created, but at least it
  doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I
  can't reproduce the crash on other machines with snapshot thin volumes
  despite it happening every time (even with maxcpus=1) on the affected
  system.
  
  It should be noted that LVM was not in use on this system until just
  before it was rebooted into the new kernel, but downgrading to -42 does
  work so it seems like a coincidence. Before I realised it was a recent
  regression I dug through mm/slub.c's history and found dde3c6b7
  ("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious --
  it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab double-free
  in case of duplicate sysfs filename"), exactly the codepath that seems
  to crash here.
  
  There's clearly some existing bug causing the slab sysfs node to not be
  added, and I guess dde3c6b7 turns that into a crash on some systems.
  This is a test system, so I can do whatever debugging is required to
  narrow down the trigger.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to