I did some manual ubuntu_kernel_selftests ftrace testing on the
5.4.0-71.79-generic kernel.  I was able to replicate the panic, but not
on every run, but even on runs with no panic dmesg would report several
soft lockups.

After removing the MOFED dkms, I was unable to replicate a panic or any
of the soft lockups previously seen. Currently I don't have evidence as
to which MOFED module is potentially triggering the problem.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1922387

Title:
  BUG: kernel NULL pointer dereference, address: 0000000000000050

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Focal:
  Confirmed
Status in linux source package in Groovy:
  Incomplete
Status in linux source package in Hirsute:
  Incomplete

Bug description:
  I observed the following kernel panic with the 5.4.0-71.79-generic
  kernel while running kernel selftests:

  blanka login: [ 1671.958400] mmiotrace: Error taking CPU253 down: -28
  [ 1672.118199] mmiotrace: Error taking CPU254 down: -28
  [ 1672.230306] mmiotrace: Error taking CPU255 down: -28
  [ 2503.359753] BUG: kernel NULL pointer dereference, address: 0000000000000050
  [ 2503.367527] #PF: supervisor read access in kernel mode
  [ 2503.373257] #PF: error_code(0x0000) - not-present page
  [ 2503.378989] PGD 0 P4D 0 
  [ 2503.381812] Oops: 0000 [#1] SMP NOPTI
  [ 2503.385896] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE     
5.4.0-71-generic #79-Ubuntu
  [ 2503.395795] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 
0.33 01/19/2021
  [ 2503.405027] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 2503.411728] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 
8d 7d a0 e8 d0 a4 ca ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 
50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 59 91
  [ 2503.432683] RSP: 0018:ffffa8d6c0003d90 EFLAGS: 00010286
  [ 2503.438513] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000080000100
  [ 2503.446474] RDX: ffff9968a228f418 RSI: 0000000000000100 RDI: 
ffff9968a228f414
  [ 2503.454436] RBP: ffffa8d6c0003df8 R08: ffff9968a228f414 R09: 
0000000000000100
  [ 2503.462394] R10: 0000000000000007 R11: 0000000000000007 R12: 
ffff9968a228f418
  [ 2503.470353] R13: 00000000fffffffa R14: 0000000000000003 R15: 
ffff9a686f9b3000
  [ 2503.478316] FS:  0000000000000000(0000) GS:ffff99690cc00000(0000) 
knlGS:0000000000000000
  [ 2503.487342] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 2503.493752] CR2: 0000000000000050 CR3: 0000007e08ad6000 CR4: 
0000000000340ef0
  [ 2503.501712] Call Trace:
  [ 2503.504438]  <IRQ>
  [ 2503.506682]  wb_timer_fn+0x1d6/0x3c0
  [ 2503.510672]  ? blk_stat_free_callback_rcu+0x30/0x30
  [ 2503.516112]  blk_stat_timer_fn+0x134/0x140
  [ 2503.520683]  call_timer_fn+0x32/0x130
  [ 2503.524768]  __run_timers.part.0+0x180/0x280
  [ 2503.529535]  ? trace_event_raw_event_softirq+0x5d/0xa0
  [ 2503.535267]  run_timer_softirq+0x2a/0x50
  [ 2503.539644]  __do_softirq+0xe1/0x2d6
  [ 2503.543629]  irq_exit+0xae/0xb0
  [ 2503.547132]  smp_apic_timer_interrupt+0x7b/0x140
  [ 2503.552280]  apic_timer_interrupt+0xf/0x20
  [ 2503.556848]  </IRQ>
  [ 2503.559187] RIP: 0010:native_safe_halt+0xe/0x10
  [ 2503.564239] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 
2d 66 dd 52 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 56 dd 52 00 fb f4 <c3> 90 0f 
1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 cd cd 63 ff 65
  [ 2503.585191] RSP: 0018:ffffffff94803e18 EFLAGS: 00000202 ORIG_RAX: 
ffffffffffffff13
  [ 2503.593635] RAX: 000000000001e7c0 RBX: ffff996849080de8 RCX: 
0000000000149022
  [ 2503.601595] RDX: 0000000000149022 RSI: 0000000000000000 RDI: 
ffffffff948c5ba0
  [ 2503.609556] RBP: ffffffff94803e38 R08: 00000000000002a8 R09: 
ffff9968a228f000
  [ 2503.617516] R10: 0000000000000000 R11: 0000000000000002 R12: 
0000000000000000
  [ 2503.625475] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
  [ 2503.633440]  ? default_idle+0x20/0x140
  [ 2503.637623]  arch_cpu_idle+0x15/0x20
  [ 2503.641608]  default_idle_call+0x23/0x30
  [ 2503.645984]  do_idle+0x1fb/0x270
  [ 2503.649583]  cpu_startup_entry+0x20/0x30
  [ 2503.653960]  rest_init+0xae/0xb0
  [ 2503.657563]  arch_call_rest_init+0xe/0x1b
  [ 2503.662025]  start_kernel+0x549/0x56a
  [ 2503.666108]  x86_64_start_reservations+0x24/0x26
  [ 2503.671258]  x86_64_start_kernel+0x75/0x79
  [ 2503.675828]  secondary_startup_64+0xa4/0xb0
  [ 2503.680493] Modules linked in: sch_etf sch_fq dccp_ipv6 dccp_ipv4 dccp 
ip6table_nat iptable_nat xt_nat nf_nat algif_hash af_alg ip6table_filter 
xt_conntrack nf_conntrack nf_defrag_ipv4 ip6_tables nf_defrag_ipv6 ip_vti 
ip6_vti fou6 sit ipip tunnel4 geneve act_mirred cls_basic esp6 authenc echainiv 
iptable_filter xt_policy bpfilter veth esp4_offload esp4 xfrm_user xfrm_algo 
macsec fou vxlan ip6_udp_tunnel udp_tunnel vrf 8021q garp mrp bridge stp llc 
ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre cls_u32 sch_htb dummy 
binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif input_leds cdc_ether usbnet 
mii ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel 
knem(OE) ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 multipath linear ses enclosure ast crct10dif_pclmul 
drm_vram_helper crc32_pclmul ttm
  [ 2503.680569]  ghash_clmulni_intel aesni_intel mlx5_core(OE) crypto_simd 
pci_hyperv_intf drm_kms_helper tls syscopyarea cryptd raid0 glue_helper 
mlxfw(OE) hid_generic sysfillrect igb sysimgblt mpt3sas uas dca mdev(OE) 
fb_sys_fops raid_class i2c_algo_bit usbhid nvme scsi_transport_sas hid 
usb_storage drm mlx_compat(OE) nvme_core i2c_piix4 [last unloaded: trace_printk]
  [ 2503.813546] CR2: 0000000000000050
  [ 2503.817337] ---[ end trace ccd7c184afc3c422 ]---
  [ 2503.933758] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 2503.940458] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 
8d 7d a0 e8 d0 a4 ca ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 
50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 59 91
  [ 2503.961410] RSP: 0018:ffffa8d6c0003d90 EFLAGS: 00010286
  [ 2503.967239] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000080000100
  [ 2503.975200] RDX: ffff9968a228f418 RSI: 0000000000000100 RDI: 
ffff9968a228f414
  [ 2503.983161] RBP: ffffa8d6c0003df8 R08: ffff9968a228f414 R09: 
0000000000000100
  [ 2503.991122] R10: 0000000000000007 R11: 0000000000000007 R12: 
ffff9968a228f418
  [ 2503.999083] R13: 00000000fffffffa R14: 0000000000000003 R15: 
ffff9a686f9b3000
  [ 2504.007044] FS:  0000000000000000(0000) GS:ffff99690cc00000(0000) 
knlGS:0000000000000000
  [ 2504.016070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 2504.022479] CR2: 0000000000000050 CR3: 0000007e08ad6000 CR4: 
0000000000340ef0
  [ 2504.030442] Kernel panic - not syncing: Fatal exception in interrupt
  [ 2504.038450] Kernel Offset: 0x12200000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)
  [ 2504.161847] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1922387/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to