We've seen 3 lustre client panics in the last few hours when using the b2_12
branch (we're using it on client nodes as it patches a data on MDT bug in
2.12.3. Still using 2.12.3 on MDS/OSS). This looks similar similar to LU-12581,
which we had seen on our system before but was fixed in 2.12.3. Could this have
been re-introduced in the b2_12 branch?
I've included the dmesg from one of the panics below. Unfortunately we have not
yet found a way to reproduce the problem. Has anyone seen anything similar to
this?
Is this mailing list a suitable place to ask for help on this sort of bug? I've
been looking at the Whamcloud Community Jira, but the link to request an
account returns "Your Jira administrator has not yet configured this contact
form."
dmesg from failed client:
[542909.741793]
=============================================================================
[542909.741800] BUG kmalloc-8 (Tainted: G OE ------------ ):
Freechain corrupt
[542909.741802]
-----------------------------------------------------------------------------
[542909.741805] Disabling lock debugging due to kernel taint
[542909.741809] INFO: Slab 0xffffe0933440b3c0 objects=102 used=75
fp=0xffff9bb6902cf558 flags=0x6fffff00000081
[542909.741812] INFO: Object 0xffff9bb6902cfad0 @offset=2768
fp=0x7fff9bb6902cfdf0
[542909.741816] Redzone ffff9bb6902cfac8: bb 3b 3b 3b 3b bb bb bb
.;;;;...
[542909.741818] Object ffff9bb6902cfad0: 6b 6b 6b 6b 6b 6b 6b a5
kkkkkkk.
[542909.741821] Redzone ffff9bb6902cfad8: bb bb bb 3b bb bb bb bb
...;....
[542909.741823] Padding ffff9bb6902cfae8: 5a 5a 5a 5a 5a 5a 5a 5a
ZZZZZZZZ
[542909.741828] CPU: 25 PID: 50461 Comm: pool Kdump: loaded Tainted: G B
OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[542909.741830] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[542909.741832] Call Trace:
[542909.741846] [<ffffffffa277ac23>] dump_stack+0x19/0x1b
[542909.741852] [<ffffffffa2221561>] print_trailer+0x161/0x280
[542909.741856] [<ffffffffa2221ebf>] on_freelist+0xff/0x270
[542909.741860] [<ffffffffa27774cc>] free_debug_processing+0x18d/0x270
[542909.741867] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
[542909.741870] [<ffffffffa2223bee>] __slab_free+0x1ce/0x290
[542909.741878] [<ffffffffa2272e58>] ? generic_setxattr+0x68/0x80
[542909.741883] [<ffffffffa2273635>] ? __vfs_setxattr_noperm+0x65/0x1b0
[542909.741889] [<ffffffffa232b7ae>] ? evm_inode_setxattr+0xe/0x10
[542909.741892] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
[542909.741895] [<ffffffffa2223db6>] kfree+0x106/0x140
[542909.741899] [<ffffffffa21ddcb5>] kvfree+0x35/0x40
[542909.741902] [<ffffffffa227399b>] setxattr+0x15b/0x1e0
[542909.741909] [<ffffffffa225c3ed>] ? putname+0x3d/0x60
[542909.741914] [<ffffffffa225d602>] ? user_path_at_empty+0x72/0xc0
[542909.741920] [<ffffffffa224d828>] ? __sb_start_write+0x58/0x120
[542909.741926] [<ffffffffa22802f1>] ? do_utimes+0xf1/0x180
[542909.741930] [<ffffffffa2273c87>] SyS_setxattr+0xb7/0x100
[542909.741937] [<ffffffffa278dede>] system_call_fastpath+0x25/0x2a
[542909.741940]
=============================================================================
[542909.741942] BUG kmalloc-8 (Tainted: G B OE ------------ ): Wrong
object count. Counter is 75 but counted were 95
[542909.741944]
-----------------------------------------------------------------------------
[542909.741947] INFO: Slab 0xffffe0933440b3c0 objects=102 used=75
fp=0xffff9bb6902cf558 flags=0x6fffff00000081
[542909.741951] CPU: 25 PID: 50461 Comm: pool Kdump: loaded Tainted: G B
OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[542909.741953] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[542909.741954] Call Trace:
[542909.741958] [<ffffffffa277ac23>] dump_stack+0x19/0x1b
[542909.741961] [<ffffffffa2221b54>] slab_err+0xb4/0xe0
[542909.741969] [<ffffffffa2030a1e>] ? show_stack+0x4e/0x60
[542909.741972] [<ffffffffa2221561>] ? print_trailer+0x161/0x280
[542909.741975] [<ffffffffa2221f85>] on_freelist+0x1c5/0x270
[542909.742227] [<ffffffffa27774cc>] free_debug_processing+0x18d/0x270
[542909.742479] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
[542909.742483] [<ffffffffa2223bee>] __slab_free+0x1ce/0x290
[542909.742488] [<ffffffffa2272e58>] ? generic_setxattr+0x68/0x80
[542909.742491] [<ffffffffa2273635>] ? __vfs_setxattr_noperm+0x65/0x1b0
[542909.742495] [<ffffffffa232b7ae>] ? evm_inode_setxattr+0xe/0x10
[542909.742498] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
[542909.742501] [<ffffffffa2223db6>] kfree+0x106/0x140
[542909.742504] [<ffffffffa21ddcb5>] kvfree+0x35/0x40
[542909.742508] [<ffffffffa227399b>] setxattr+0x15b/0x1e0
[542909.742511] [<ffffffffa225c3ed>] ? putname+0x3d/0x60
[542909.742515] [<ffffffffa225d602>] ? user_path_at_empty+0x72/0xc0
[542909.742519] [<ffffffffa224d828>] ? __sb_start_write+0x58/0x120
[542909.742523] [<ffffffffa22802f1>] ? do_utimes+0xf1/0x180
[542909.742527] [<ffffffffa2273c87>] SyS_setxattr+0xb7/0x100
[542909.742530] [<ffffffffa278dede>] system_call_fastpath+0x25/0x2a
[542909.742533] FIX kmalloc-8: Object count adjusted.
[542909.742536]
=============================================================================
[542909.742538] BUG kmalloc-8 (Tainted: G B OE ------------ ):
Redzone overwritten
[542909.742539]
-----------------------------------------------------------------------------
[542909.742543] INFO: 0xffff9bb6902cf858-0xffff9bb6902cf85f. First byte 0x4c
instead of 0xcc
[542909.742545] INFO: Slab 0xffffe0933440b3c0 objects=102 used=95
fp=0xffff9bb6902cf558 flags=0x6fffff00000081
[542909.742547] INFO: Object 0xffff9bb6902cf850 @offset=2128
fp=0x7f7f1b36102c7c10
[542909.742550] Redzone ffff9bb6902cf848: cc cc cc cc cc cc cc cc
........
[542909.742552] Object ffff9bb6902cf850: d0 0b d6 0b 88 01 00 25
.......%
[542909.742555] Redzone ffff9bb6902cf858: 4c 4c 4c 4c 4c 4c 4c 4c
LLLLLLLL
[542909.742557] Padding ffff9bb6902cf868: 5a 5a 5a 5a 5a 5a 5a 5a
ZZZZZZZZ
[542909.742560] CPU: 25 PID: 50461 Comm: pool Kdump: loaded Tainted: G B
OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[542909.742562] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[542909.742563] Call Trace:
[542909.742567] [<ffffffffa277ac23>] dump_stack+0x19/0x1b
[542909.742570] [<ffffffffa2221561>] print_trailer+0x161/0x280
[542909.742573] [<ffffffffa22217ef>] check_bytes_and_report+0xcf/0x110
[542909.742576] [<ffffffffa222237d>] check_object+0x1dd/0x2a0
[542909.742580] [<ffffffffa27773cc>] free_debug_processing+0x8d/0x270
[542909.742583] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
[542909.742586] [<ffffffffa2223bee>] __slab_free+0x1ce/0x290
[542909.742590] [<ffffffffa2272e58>] ? generic_setxattr+0x68/0x80
[542909.742593] [<ffffffffa2273635>] ? __vfs_setxattr_noperm+0x65/0x1b0
[542909.742596] [<ffffffffa232b7ae>] ? evm_inode_setxattr+0xe/0x10
[542909.742599] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
[542909.742602] [<ffffffffa2223db6>] kfree+0x106/0x140
[542909.742606] [<ffffffffa21ddcb5>] kvfree+0x35/0x40
[542909.742609] [<ffffffffa227399b>] setxattr+0x15b/0x1e0
[542909.742613] [<ffffffffa225c3ed>] ? putname+0x3d/0x60
[542909.742617] [<ffffffffa225d602>] ? user_path_at_empty+0x72/0xc0
[542909.742621] [<ffffffffa224d828>] ? __sb_start_write+0x58/0x120
[542909.742624] [<ffffffffa22802f1>] ? do_utimes+0xf1/0x180
[542909.742628] [<ffffffffa2273c87>] SyS_setxattr+0xb7/0x100
[542909.742631] [<ffffffffa278dede>] system_call_fastpath+0x25/0x2a
[542909.742635] FIX kmalloc-8: Restoring
0xffff9bb6902cf858-0xffff9bb6902cf85f=0xcc
[542909.742648] FIX kmalloc-8: Object at 0xffff9bb6902cf850 not freed
[542909.763926] general protection fault: 0000 [#1] SMP
[542909.792826] Modules linked in: tcp_diag inet_diag fuse nfsd mgc(OE)
lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE)
ptlrpc(OE) obdclass(OE) cts lnet(OE) rpcsec_gss_krb5 nfsv4 dns_resolver
libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE)
ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE)
ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_ipv4
nf_defrag_ipv4 xt_multiport xt_recent xt_conntrack nf_conntrack iptable_filter
mlx4_ib(OE) dm_mirror dm_region_hash dm_log dm_mod ib_uverbs(OE) ib_core(OE)
sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel mgag200
mlx4_core(OE) iTCO_wdt iTCO_vendor_support ttm kvm drm_kms_helper irqbypass
syscopyarea sysfillrect crc32_pclmul sysimgblt crc32c_intel
[542910.218156] fb_sys_fops mlx_compat(OE) ghash_clmulni_intel drm aesni_intel
lrw gf128mul glue_helper ses ablk_helper devlink enclosure cryptd
drm_panel_orientation_quirks hpwdt i2c_i801 pcspkr pcc_cpufreq wmi ioatdma
ipmi_si acpi_power_meter ipmi_devintf ipmi_msghandler lpc_ich knem(OE)
binfmt_misc auth_rpcgss ip_tables smartpqi bridge stp llc xfs isci libsas
qla3xxx e1000e igb i2c_algo_bit megaraid_sas aacraid aic79xx ata_piix mpt2sas
raid_class mptspi scsi_transport_spi mptsas mptscsih mptbase arcmsr ahci
libahci sata_nv sata_svw bnx2x libcrc32c bnx2 ext4 mbcache jbd2 sata_sil libata
tg3 e1000 nfsv3 nfs_acl nfs lockd grace sunrpc fscache tun sd_mod crc_t10dif
crct10dif_generic sg ixgbe crct10dif_pclmul crct10dif_common hpsa dca mdio
hpilo ptp scsi_transport_sas pps_core [last unloaded: ipmi_msghandler]
[542910.624054]
[542910.625230] CPU: 27 PID: 25861 Comm: gdbus Kdump: loaded Tainted: G B
OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[542910.685731] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[542910.724144] task: ffff9ba5b5bc1070 ti: ffff9ba6067c0000 task.ti:
ffff9ba6067c0000
[542910.768155] RIP: 0010:[<ffffffffa21f711b>] [<ffffffffa21f711b>]
find_vma+0x3b/0x60
[542910.810986] RSP: 0000:ffff9ba6067c3ea8 EFLAGS: 00010202
[542910.840760] RAX: ffff9bb72066f1b8 RBX: 0000000000000004 RCX:
ffff9ba6067c3fd8
[542910.880983] RDX: 7fff9bb7c2fec608 RSI: 0000000000682888 RDI:
ffff9ba002a34b00
[542910.919946] RBP: ffff9ba6067c3ea8 R08: 0000000000000001 R09:
0000000000000000
[542910.958846] R10: 000000000000001c R11: 00002aaaae480b40 R12:
00000000000000a8
[542910.998593] R13: 0000000000682888 R14: ffff9ba6067c3f58 R15:
ffff9ba002a34b00
[542911.038992] FS: 00002aaabc395700(0000) GS:ffff9bb97f140000(0000)
knlGS:0000000000000000
[542911.095715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[542911.155694] CR2: 0000000000682888 CR3: 0000003214b00000 CR4:
00000000003607e0
[542911.202949] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[542911.265589] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[542911.315387] Call Trace:
[542911.355844] [<ffffffffa278857d>] __do_page_fault+0x13d/0x500
[542911.413348] [<ffffffffa2788975>] do_page_fault+0x35/0x90
[542911.455443] [<ffffffffa2784778>] page_fault+0x28/0x30
[542911.495307] Code: 74 06 48 39 70 08 77 40 48 8b 57 08 31 c0 48 85 d2 75 18
eb 2e 0f 1f 00 48 3b 72 e0 48 8d 42 e0 73 1d 48 8b 52 10 48 85 d2 74 0f <48> 3b
72 e8 72 e7 48 8b 52 08 48 85 d2 75 f1 48 85 c0 74 04 48
[542911.665436] RIP [<ffffffffa21f711b>] find_vma+0x3b/0x60
[542911.695917] RSP <ffff9ba6067c3ea8>
--
--
# Dr. Christopher Mountford
# System specialist - Research Computing/HPC
#
# IT services,
# University of Leicester, University Road,
# Leicester, LE1 7RH, UK
#
# t: 0116 252 3471
# e: [email protected]
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org