It seems the crash happens in the dummy handler for an interrupt that is used 
for the unlock of a paravirtualized spinlock. It is set when a vcpu is 
initialized but then the interrupt is disabled. So I assume an interrupt 
happens between request_irq and disable_irq. 
I tried 2 things on jammy (5.15.0 kernel on top of the latest in proposed:

1. disabled the paravirtualized spinlock (used xen_nopvspin as boot param) and 
as expected the crash did not occur
2. Tried the irq flag: IRQ_NOAUTOEN, basically this is used to not enable the 
irq when requested.
Unfortunately crash happened again:
[  434.016304] smpboot: Booting Node 0 Processor 2 APIC 0x1
[  434.020165] ------------[ cut here ]------------
[  434.020166] kernel BUG at arch/x86/xen/spinlock.c:62!
[  434.021628] invalid opcode: 0000 [#1] SMP PTI
[  434.022901] CPU: 2 PID: 25 Comm: cpuhp/2 Kdump: loaded Not tainted 
5.15.0-88-generic #98-Ubuntu
[  434.025279] Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
[  434.027056] RIP: 0010:dummy_handler+0x0/0x10
[  434.028343] Code: e4 e8 54 e1 7a 00 8b 75 e4 84 c0 74 d2 44 89 ef e8 35 a5 
7a 00 eb d2 44 89 ef e8 cb e1 7a 00 eb c8 66 0f 1f 84 00 00 00 00 00 <0f> 0b 66 
66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 57
[  434.033297] RSP: 0000:ffffa06a40180e28 EFLAGS: 00010046
[  434.034830] RAX: ffffffffa5637ef0 RBX: ffff8e41c88b9b80 RCX: 0000000000000000
[  434.036839] RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000041
[  434.038839] RBP: ffffa06a40180e68 R08: 0000000000000000 R09: ffffffffa757bf68
[  434.040833] R10: 0000000000000000 R11: ffffa06a40180ff8 R12: 0000000000000000
[  434.042818] R13: ffffa06a40180e7c R14: 0000000000000041 R15: ffffa06a40180f98
[  434.044855] FS:  0000000000000000(0000) GS:ffff8e42a8280000(0000) 
knlGS:0000000000000000
[  434.047083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  434.048762] CR2: 0000000000000000 CR3: 0000000149610001 CR4: 00000000001706e0
[  434.050778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  434.052767] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  434.054757] Call Trace:
[  434.056028]  <IRQ>
[  434.057196]  ? show_trace_log_lvl+0x1d6/0x2ea
[  434.058984]  ? show_trace_log_lvl+0x1d6/0x2ea
[  434.060765]  ? handle_irq_event_percpu+0x33/0x80
[  434.062656]  ? show_regs.part.0+0x23/0x29
[  434.064357]  ? __die_body.cold+0x8/0xd
[  434.065977]  ? __die+0x2b/0x37
[  434.067413]  ? die+0x30/0x60
[  434.068818]  ? do_trap+0xbe/0x100
[  434.070345]  ? do_error_trap+0x6f/0xb0
[  434.071947]  ? xen_qlock_wait+0x90/0x90
[  434.073581]  ? exc_invalid_op+0x53/0x70
[  434.075205]  ? xen_qlock_wait+0x90/0x90
[  434.076832]  ? asm_exc_invalid_op+0x1b/0x20
[  434.078546]  ? xen_qlock_wait+0x90/0x90
[  434.080164]  ? xen_qlock_wait+0x90/0x90
[  434.081788]  ? xen_qlock_wait+0x90/0x90
[  434.083411]  ? __handle_irq_event_percpu+0x42/0x170
[  434.085302]  handle_irq_event_percpu+0x33/0x80
[  434.087097]  handle_percpu_irq+0x3d/0x60
[  434.088754]  handle_irq_desc+0x3e/0x50
[  434.090383]  generic_handle_irq+0x1f/0x30
[  434.092056]  handle_irq_for_port+0x8d/0x160
[  434.093769]  ? irq_get_irq_data+0xe/0x20
[  434.095422]  evtchn_2l_handle_events+0x16e/0x3b0
[  434.097253]  __xen_evtchn_do_upcall+0x6c/0xb0
[  434.099016]  xen_hvm_evtchn_do_upcall+0xe/0x20
[  434.100797]  __sysvec_xen_hvm_callback+0x26/0x40
[  434.102625]  sysvec_xen_hvm_callback+0x7b/0x90
[  434.104408]  </IRQ>
[  434.105591]  <TASK>
[  434.106783]  asm_sysvec_xen_hvm_callback+0x1b/0x20
[  434.108651] RIP: 0010:_raw_spin_unlock_irqrestore+0x25/0x30
[  434.110734] Code: eb 8d cc cc cc 0f 1f 44 00 00 55 48 89 e5 e8 4a 08 37 ff 
66 90 f7 c6 00 02 00 00 75 06 5d c3 cc cc cc cc fb 66 0f 1f 44 00 00 <5d> c3 cc 
cc cc cc 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 8b 07
[  434.117127] RSP: 0000:ffffa06a40153cc8 EFLAGS: 00000206
[  434.119113] RAX: 0000000000000001 RBX: ffff8e41ea1bb560 RCX: 000000000002dc00
[  434.121575] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff8e41ea1bb4a4
[  434.124053] RBP: ffffa06a40153cc8 R08: 0000000000000000 R09: ffffffffa757bf68
[  434.126521] R10: 0000000000000246 R11: ffff8e41ea1bb4a4 R12: ffff8e41ea1bb400
[  434.128994] R13: ffff8e41c88b9b80 R14: 0000000000000041 R15: 0000000000000000
[  434.131462]  ? _raw_spin_unlock_irqrestore+0xe/0x30
[  434.133356]  __setup_irq+0x3a8/0x780
[  434.134923]  ? kmem_cache_alloc_trace+0x19e/0x2e0
[  434.136773]  request_threaded_irq+0x112/0x180
[  434.138534]  bind_ipi_to_irqhandler+0xe7/0x3d0
[  434.140335]  ? xen_qlock_wait+0x90/0x90
[  434.141965]  xen_init_lock_cpu+0xb3/0x150
[  434.143641]  ? zhaoxin_pmu_disable_event+0xe0/0xe0
[  434.145512]  xen_cpu_up_online+0xe/0x20
[  434.147149]  cpuhp_invoke_callback+0x10f/0x420
[  434.148932]  ? blk_mq_alloc_request+0xc0/0xc0
[  434.150700]  cpuhp_thread_fun+0xc0/0x1c0
[  434.152354]  ? _raw_spin_unlock_irqrestore+0xe/0x30
[  434.154258]  smpboot_thread_fn+0xba/0x160
[  434.155933]  ? smpboot_register_percpu_thread+0x140/0x140
[  434.157961]  kthread+0x12a/0x150
[  434.159440]  ? set_kthread_struct+0x50/0x50
[  434.161154]  ret_from_fork+0x22/0x30
[  434.162727]  </TASK>
[  434.163938] Modules linked in: cpuid tls binfmt_misc nls_iso8859_1 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd 
cryptd psmouse ixgbevf input_leds serio_raw dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore ip_tables x_tables autofs4

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/2034057

Title:
  function_graph tracer in ftrace related tests triggers kernel bug
  (arch/x86/xen/spinlock.c:62) and kernel panic on AWS cloud c3.xlarge

Status in ubuntu-kernel-tests:
  New
Status in linux-aws package in Ubuntu:
  New
Status in linux-aws source package in Bionic:
  New
Status in linux-aws source package in Focal:
  New
Status in linux-aws source package in Jammy:
  New
Status in linux-aws source package in Lunar:
  New

Bug description:
  Test:
    * ftrace:test.d--00basic--basic2.tc from ubuntu_kselftests_ftrace
    * test_enable_all_tracers() and test_enable_all_tracers() from 
ubuntu_ftrace_smoke_test

  Will crash AWS instance c3.xlarge when testing the "function_graph"
  tracer.

  We have a similar issue filed against Azure (bug 1882669). Filing a
  new bug report because on AWS this is affecting 5.4 ~ 6.2 AWS kernel.
  However on Azure this is not affecting newer kernels.

  Take B-aws-5.4-1108 for example, with the ubuntu_kselftests_ftrace test the 
system will crash with the following error message and reboot (similar to bug 
2029917):
  [  211.675624] kernel BUG at 
/build/linux-aws-5.4-VHOhmt/linux-aws-5.4-5.4.0/arch/x86/xen/spinlock.c:62!
  [  211.678258] invalid opcode: 0000 [#1] SMP PTI
  [  211.679596] CPU: 1 PID: 14 Comm: cpuhp/1 Not tainted 5.4.0-1108-aws 
#116~18.04.1-Ubuntu
  [  211.681825] Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
  [  211.683728] RIP: 0010:dummy_handler+0x4/0x10
  [  211.685042] Code: 8b 75 e4 74 d6 44 89 e7 e8 f9 88 61 00 eb d6 44 89 e7 e8 
6f ab 61 00 eb cc 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 <0f> 0b 66 
2e 0f 1f 84 00 00 00 00 00 80 3d 59 d0 9f 01 00 75 02 f3
  [  211.690314] RSP: 0000:ffffaecd4000ee38 EFLAGS: 00010046
  [  211.691934] RAX: ffffffffb462e3e0 RBX: 000000000000003b RCX: 
0000000000000000
  [  211.694036] RDX: 0000000000400e00 RSI: 0000000000000000 RDI: 
000000000000003b
  [  211.696159] RBP: ffffaecd4000ee38 R08: ffff8aefa6c036c0 R09: 
ffff8aefa6c038c0
  [  211.698232] R10: 0000000000000000 R11: ffffffffb6064da8 R12: 
0000000000000000
  [  211.700365] R13: 0000000000000000 R14: 0000000000000000 R15: 
ffff8aefa34b8700
  [  211.702482] FS:  0000000000000000(0000) GS:ffff8aefa8040000(0000) 
knlGS:0000000000000000
  [  211.704894] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  211.706598] CR2: 0000000000000000 CR3: 00000001b400a001 CR4: 
00000000001606e0
  [  211.708731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [  211.710835] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
  [  211.713448] Call Trace:
  [  211.714796]  <IRQ>
  [  211.715996]  __handle_irq_event_percpu+0x44/0x1a0
  [  211.717907]  handle_irq_event_percpu+0x32/0x80
  [  211.719759]  handle_percpu_irq+0x3d/0x60
  [  211.721474]  generic_handle_irq+0x28/0x40
  [  211.723184]  handle_irq_for_port+0x8f/0xe0
  [  211.724988]  evtchn_2l_handle_events+0x157/0x270
  [  211.726913]  __xen_evtchn_do_upcall+0x76/0xe0
  [  211.728749]  xen_evtchn_do_upcall+0x2b/0x40
  [  211.730520]  xen_hvm_callback_vector+0xf/0x20
  [  211.732271]  </IRQ>
  [  211.733425] RIP: 0010:_raw_spin_unlock_irqrestore+0x15/0x20
  [  211.735470] Code: e8 70 3d 64 ff 4c 29 e0 4c 39 f0 76 cf 80 0b 08 eb 8a 90 
90 90 0f 1f 44 00 00 55 48 89 e5 e8 a6 ad 66 ff 66 90 48 89 f7 57 9d <0f> 1f 44 
00 00 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 c6 07
  [  211.741920] RSP: 0000:ffffaecd400fbcf8 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff0c
  [  211.744980] RAX: 0000000000000001 RBX: ffff8aefa34b8700 RCX: 
000000000002cc00
  [  211.747420] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 
0000000000000246
  [  211.749882] RBP: ffffaecd400fbcf8 R08: ffff8aefa6c036c0 R09: 
ffff8aefa6c038c0
  [  211.752340] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000000003b
  [  211.754785] R13: 0000000000000000 R14: ffff8aef95bdfa00 R15: 
ffff8aef95bdfaa4
  [  211.757249]  __setup_irq+0x456/0x760
  [  211.758779]  ? kmem_cache_alloc_trace+0x170/0x230
  [  211.760639]  request_threaded_irq+0xfb/0x160
  [  211.762358]  bind_ipi_to_irqhandler+0xba/0x1c0
  [  211.764124]  ? xen_qlock_wait+0x90/0x90
  [  211.765734]  ? snr_uncore_mmio_init+0x20/0x20
  [  211.767496]  xen_init_lock_cpu+0x78/0xd0
  [  211.769135]  ? snr_uncore_mmio_init+0x20/0x20
  [  211.770864]  xen_cpu_up_online+0xe/0x20
  [  211.772500]  cpuhp_invoke_callback+0x8a/0x580
  [  211.774233]  cpuhp_thread_fun+0xb8/0x120
  [  211.775866]  smpboot_thread_fn+0xfc/0x170
  [  211.777524]  kthread+0x121/0x140
  [  211.778968]  ? sort_range+0x30/0x30
  [  211.780508]  ? kthread_park+0x90/0x90
  [  211.782073]  ret_from_fork+0x35/0x40
  [  211.783622] Modules linked in: nls_iso8859_1 binfmt_misc serio_raw 
sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ixgbevf
  [  211.796497] ---[ end trace bb7d4e9bb7f852cb ]---
  [  211.798303] RIP: 0010:dummy_handler+0x4/0x10
  [  211.800022] Code: 8b 75 e4 74 d6 44 89 e7 e8 f9 88 61 00 eb d6 44 89 e7 e8 
6f ab 61 00 eb cc 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 <0f> 0b 66 
2e 0f 1f 84 00 00 00 00 00 80 3d 59 d0 9f 01 00 75 02 f3
  [  211.806379] RSP: 0000:ffffaecd4000ee38 EFLAGS: 00010046
  [  211.808356] RAX: ffffffffb462e3e0 RBX: 000000000000003b RCX: 
0000000000000000
  [  211.810792] RDX: 0000000000400e00 RSI: 0000000000000000 RDI: 
000000000000003b
  [  211.813263] RBP: ffffaecd4000ee38 R08: ffff8aefa6c036c0 R09: 
ffff8aefa6c038c0
  [  211.815713] R10: 0000000000000000 R11: ffffffffb6064da8 R12: 
0000000000000000
  [  211.818162] R13: 0000000000000000 R14: 0000000000000000 R15: 
ffff8aefa34b8700
  [  211.820637] FS:  0000000000000000(0000) GS:ffff8aefa8040000(0000) 
knlGS:0000000000000000
  [  211.823821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  211.825904] CR2: 0000000000000000 CR3: 00000001b400a001 CR4: 
00000000001606e0
  [  211.828357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [  211.830829] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
  [  211.833284] Kernel panic - not syncing: Fatal exception in interrupt
  [  211.835575] Kernel Offset: 0x33600000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)

  Test output:
   + echo 1
   + . 
/home/ubuntu/autotest/client/tmp/ubuntu_kselftests_ftrace/src/linux/tools/testing/selftests/ftrace/test.d/00basic/basic2.tc
   + test -f available_tracers
   + cat available_tracers
   + echo hwlat
   + echo blk
   + echo mmiotrace
   + echo function_graph
  (Test interrupted here)

  To verify this, you can hack 
linux/tools/testing/selftests/ftrace/test.d/00basic/basic2.tc and add "read -p" 
to check what tracer we're testing:
    #!/bin/sh
    # SPDX-License-Identifier: GPL-2.0
    # description: Basic test for tracers
    # flags: instance
    test -f available_tracers
    for t in `cat available_tracers`; do
      read -p "testing $t" foo
      echo $t > current_tracer
    done
    echo nop > current_tracer

  And run this manually with sudo ./ftracetest -vvv
  test.d/00basic/basic2.tc, you will see the system crashes right away
  with "function_graph".

  For ubuntu_ftrace_smoke_test, the "function_graph" tracer will be
  tested with test_function_graph_tracer(). And since "function_graph"
  tracer is in /sys/kernel/debug/tracing/available_tracers, it will be
  tested with  test_enable_all_tracers() as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2034057/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to