[Kernel-packages] [Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

William Grant Tue, 08 Sep 2020 18:21:11 -0700

https://lore.kernel.org/linux-
mm/alpine.lrh.2.02.1806151817130.6...@file01.intranet.prod.int.rdu2.redhat.com/
(2018's "slub: fix failure when we delete and create a slab cache")
looks relevant to similar problems with this particular slub callsite.


-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Focal:
  New

Bug description:
  One of my bionic servers with HWE 5.4.0 hangs on boot (apparently
  while starting LVM snapshots) after upgrading from Linux 5.4.0-42 to
  5.4.0-47, with the following trace:

    [   29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, 
don't try to register things with the same name in the same directory.
    [   29.138854] BUG: kernel NULL pointer dereference, address: 
0000000000000020
    [   29.145977] #PF: supervisor read access in kernel mode
    [   29.145979] #PF: error_code(0x0000) - not-present page
    [   29.145981] PGD 0 P4D 0
    [   29.158800] Oops: 0000 [#1] SMP NOPTI
    [   29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic 
#50~18.04.1-Ubuntu
    [   29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 
07/15/2019
    [   29.178038] RIP: 0010:free_percpu+0x120/0x1f0
    [   29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 
4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 
58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
    [   29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046
    [   29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: 
ffffffffa880a000
    [   29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 
0000000000000000
    [   29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: 
ffffffffa74a5300
    [   29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: 
cf35c0f24f14c3c0
    [   29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 
0000000000000008
    [   29.244878] FS:  00007f93a04b0900(0000) GS:ffff913faed80000(0000) 
knlGS:0000000000000000
    [   29.252961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 
00000000003406e0
    [   29.265883] Call Trace:
    [   29.268346]  __kmem_cache_release+0x1a/0x30
    [   29.273913]  __kmem_cache_create+0x4f9/0x550
    [   29.278192]  ? __kmalloc_node+0x1eb/0x320
    [   29.282205]  ? kvmalloc_node+0x31/0x80
    [   29.285962]  create_cache+0x120/0x1f0
    [   29.291003]  kmem_cache_create_usercopy+0x17d/0x270
    [   29.295882]  kmem_cache_create+0x16/0x20
    [   29.300152]  dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
    [   29.305644]  ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
    [   29.310693]  persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
    [   29.316627]  ? _cond_resched+0x19/0x40
    [   29.320384]  snapshot_ctr+0x79e/0x910 [dm_snapshot]
    [   29.325276]  dm_table_add_target+0x18d/0x370
    [   29.329552]  table_load+0x12a/0x370
    [   29.333045]  ctl_ioctl+0x1e2/0x590
    [   29.336450]  ? retrieve_status+0x1c0/0x1c0
    [   29.340551]  dm_ctl_ioctl+0xe/0x20
    [   29.343958]  do_vfs_ioctl+0xa9/0x640
    [   29.347547]  ? ksys_semctl.constprop.19+0xf7/0x190
    [   29.352337]  ksys_ioctl+0x75/0x80
    [   29.355663]  __x64_sys_ioctl+0x1a/0x20
    [   29.359421]  do_syscall_64+0x57/0x190
    [   29.363094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [   29.368144] RIP: 0033:0x7f939f0286d7
    [   29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 
c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
    [   29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 
0000000000000010
    [   29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 
00007f939f0286d7
    [   29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 
0000000000000009
    [   29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 
00007ffe918defd0
    [   29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 
00007f939f59c4e6
    [   29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 
00007f939f59c4e6
    [   29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero 
nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd 
kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel 
ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core 
bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast 
ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas 
nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect 
ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw 
i2c_algo_bit scsi_transport_sas drm hid i2c_piix4
    [   29.507853] CR2: 0000000000000020
    [   29.511174] ---[ end trace 43bd923f80cbdf52 ]---

  That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
  working kernel shows some trouble there:

    $ uname -a
    Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 
07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
    $ ls -l /sys/kernel/slab | grep a-0000152
    lrwxrwxrwx 1 root root 0 Sep  8 03:20 dm_bufio_buffer -> :a-0000152

  So on 5.4.0-42 the named node doesn't get created, but at least it
  doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I
  can't reproduce the crash on other machines with snapshot thin volumes
  despite it happening every time (even with maxcpus=1) on the affected
  system.

  It should be noted that LVM was not in use on this system until just
  before it was rebooted into the new kernel, but downgrading to -42
  does work so it seems like a coincidence. Before I realised it was a
  recent regression I dug through mm/slub.c's history and found dde3c6b7
  ("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious
  -- it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab
  double-free in case of duplicate sysfs filename"), exactly the
  codepath that seems to crash here.

  There's clearly some existing bug causing the slab sysfs node to not
  be added, and I guess dde3c6b7 turns that into a crash on some
  systems. This is a test system, so I can do whatever debugging is
  required to narrow down the trigger.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

Reply via email to