I double checked the ubuntu_ftrace_smoke_tests on aws:r5.metal with
kernel 6.8.0-86.87 and it did pass.

** Tags removed: verification-needed-noble-linux
** Tags added: verification-done-noble-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2121673

Title:
  noble ubuntu_ftrace_smoke_test:mmiotrace timeout on aws:r5.metal

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Noble:
  Fix Committed

Bug description:
  [Impact]

  This happens for 6.8.0-80.80 (2025.08.11) generic kernel and only
  happens with aws:r5.metal instance. 6.12 kernel works find. Juerg
  found the offending commit to be:

      memcg: drain obj stock on cpu hotplug teardown

      BugLink: https://bugs.launchpad.net/bugs/2119458

      commit 9f01b4954490d4ccdbcc2b9be34a9921ceee9cbb upstream.

      Currently on cpu hotplug teardown, only memcg stock is drained but we
      need to drain the obj stock as well otherwise we will miss the stats
      accumulated on the target cpu as well as the nr_bytes cached. The stats
      include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
      addition we are leaking reference to struct obj_cgroup object.

  Because nothing in the upstream patchset depends on this commit we
  decided to delay applying this patch until the next SRU cycle.

  INFO | START ubuntu_ftrace_smoke_test.ftrace-smoke-test 
ubuntu_ftrace_smoke_test.ftrace-smoke-test timeout=900 timestamp=1756180477 
localtime=Aug 26 03:54:37
  DEBUG| Persistent state client._record_indent now set to 2
  DEBUG| Persistent state client.unexpected_reboot now set to 
('ubuntu_ftrace_smoke_test.ftrace-smoke-test', 
'ubuntu_ftrace_smoke_test.ftrace-smoke-test')
  DEBUG| Waiting for pid 3906 for 900 seconds
  WARNI| System python is too old, crash handling disabled
  DEBUG| Running 
'/home/ubuntu/autotest/client/tests/ubuntu_ftrace_smoke_test/ubuntu_ftrace_smoke_test.sh'
  DEBUG| [stdout] PASSED (CONFIG_FUNCTION_TRACER=y in 
/boot/config-6.8.0-80-generic)
  DEBUG| [stdout] PASSED (CONFIG_FUNCTION_GRAPH_TRACER=y in 
/boot/config-6.8.0-80-generic)
  DEBUG| [stdout] PASSED (CONFIG_STACK_TRACER=y in 
/boot/config-6.8.0-80-generic)
  DEBUG| [stdout] PASSED (CONFIG_DYNAMIC_FTRACE=y in 
/boot/config-6.8.0-80-generic)
  DEBUG| [stdout] PASSED all expected /sys/kernel/debug/tracing files exist
  DEBUG| [stdout] PASSED (function_graph in 
/sys/kernel/debug/tracing/available_tracers)
  DEBUG| [stdout] PASSED (function in 
/sys/kernel/debug/tracing/available_tracers)
  DEBUG| [stdout] PASSED (nop in /sys/kernel/debug/tracing/available_tracers)
  DEBUG| [stdout] PASSED (tracer function can be enabled)
  DEBUG| [stdout] PASSED (tracer function_graph can be enabled)
  ERROR| [stderr] grep: /tmp/ftrace-kernel-trace-3910.tmp.log: binary file 
matches
  DEBUG| [stdout] - tracer function_graph got enough data
  DEBUG| [stdout] - tracer function_graph completed
  DEBUG| [stdout] - tracer function_graph being turned off
  ERROR| [stderr] grep: /tmp/ftrace-kernel-trace-3910.tmp.log: binary file 
matches
  DEBUG| [stdout] - tracer got 231 irq events
  DEBUG| [stdout] - tracer timerlat got enough data
  DEBUG| [stdout] - tracer timerlat completed
  DEBUG| [stdout] - tracer timerlat being turned off
  DEBUG| [stdout] - tracer nop being set as current tracer
  DEBUG| [stdout] PASSED (tracer timerlat can be enabled (got 660 lines of 
tracing output))
  DEBUG| [stdout] - tracer osnoise got enough data
  DEBUG| [stdout] - tracer osnoise completed
  DEBUG| [stdout] - tracer osnoise being turned off
  DEBUG| [stdout] - tracer nop being set as current tracer
  DEBUG| [stdout] PASSED (tracer osnoise can be enabled (got 11 lines of 
tracing output))
  DEBUG| [stdout] - tracer hwlat got enough data
  DEBUG| [stdout] - tracer hwlat completed
  DEBUG| [stdout] - tracer hwlat being turned off
  DEBUG| [stdout] - tracer nop being set as current tracer
  DEBUG| [stdout] PASSED (tracer hwlat can be enabled (got 13 lines of tracing 
output))
  DEBUG| [stdout] - tracer blk got enough data
  DEBUG| [stdout] - tracer blk completed
  DEBUG| [stdout] - tracer blk being turned off
  DEBUG| [stdout] - tracer nop being set as current tracer
  DEBUG| [stdout] PASSED (tracer blk can be enabled (got 2 lines of tracing 
output))
  DEBUG| [stdout] TIMER END Tue Aug 26 03:58:59 UTC 2025
  DEBUG| [stdout] TIMEOUT
  DEBUG| [stdout] FAILED: aborting, timeout, took way too long to complete
  INFO | Timer expired (900 sec.), nuking pid 3906
  INFO | ERROR ubuntu_ftrace_smoke_test.ftrace-smoke-test 
ubuntu_ftrace_smoke_test.ftrace-smoke-test timestamp=1756181377 localtime=Aug 
26 04:09:37 Test timeout expired, rc=15
  INFO | END ERROR ubuntu_ftrace_smoke_test.ftrace-smoke-test 
ubuntu_ftrace_smoke_test.ftrace-smoke-test timestamp=1756181377 localtime=Aug 
26 04:09:37

  Running 'sudo chcpu -d 1-95' results in:

  [   82.891707] BUG: kernel NULL pointer dereference, address: 0000000000000000
  [   82.891959] #PF: supervisor read access in kernel mode
  [   82.891959] #PF: error_code(0x0000) - not-present page
  [   82.891959] PGD 0 P4D 0
  [   82.891959] Oops: 0000 [#1] PREEMPT SMP NOPTI
  [   82.891959] CPU: 0 PID: 593 Comm: kworker/0:2 Not tainted 6.8.0-80-generic 
#80-Ubuntu
  [   82.891959] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.3-debian-1.16.3-2 04/01/2014
  [   82.891959] Workqueue: events work_for_cpu_fn
  [   82.891959] RIP: 0010:memcg_hotplug_cpu_dead+0x65/0xc0
  [   82.891959] Code: 44 00 00 48 89 df e8 5a ef ff ff 48 89 c3 41 f7 c5 00 02 
00 00 74 06 fb 0f 1f 44 00 00 4c 89 e7 e8 f0 cd ff ff e8 6b d9 d0 ff <48> 8b 03 
a8 03 75 1e 65 48 ff 08 e8 ab 35 d1 ff 31 c0 5b 41 5c 41
  [   82.891959] RSP: 0018:ffffbd548170bd10 EFLAGS: 00000246
  [   82.891959] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
  [   82.891959] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
  [   82.891959] RBP: ffffbd548170bd28 R08: 0000000000000000 R09: 
0000000000000000
  [   82.891959] R10: 000000000000001c R11: 0000000000000000 R12: 
ffff99183bcb0c00
  [   82.891959] R13: 0000000000000286 R14: 0000000000000001 R15: 
0000000000000000
  [   82.891959] FS:  0000000000000000(0000) GS:ffff99183bc00000(0000) 
knlGS:0000000000000000
  [   82.891959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   82.891959] CR2: 0000000000000000 CR3: 000000001c43c000 CR4: 
00000000000006f0
  [   82.891959] Call Trace:
  [   82.891959]  <TASK>
  [   82.891959]  ? show_regs+0x6d/0x80
  [   82.891959]  ? __die+0x24/0x80
  [   82.891959]  ? page_fault_oops+0x99/0x1b0
  [   82.891959]  ? kernelmode_fixup_or_oops.isra.0+0x69/0x90
  [   82.891959]  ? __bad_area_nosemaphore+0x19e/0x2c0
  [   82.891959]  ? bad_area_nosemaphore+0x16/0x30
  [   82.891959]  ? do_user_addr_fault+0x29d/0x670
  [   82.891959]  ? exc_page_fault+0x83/0x1b0
  [   82.891959]  ? asm_exc_page_fault+0x27/0x30
  [   82.891959]  ? memcg_hotplug_cpu_dead+0x65/0xc0
  [   82.891959]  ? __pfx_memcg_hotplug_cpu_dead+0x10/0x10
  [   82.891959]  cpuhp_invoke_callback+0x348/0x530
  [   82.891959]  __cpuhp_invoke_callback_range+0x80/0x100
  [   82.891959]  _cpu_down+0xfb/0x280
  [   82.891959]  __cpu_down_maps_locked+0x15/0x30
  [   82.891959]  work_for_cpu_fn+0x1a/0x30
  [   82.891959]  process_one_work+0x184/0x3a0
  [   82.891959]  worker_thread+0x306/0x440
  [   82.891959]  ? _raw_spin_lock_irqsave+0xe/0x20
  [   82.891959]  ? __pfx_worker_thread+0x10/0x10
  [   82.891959]  kthread+0xf2/0x120
  [   82.891959]  ? __pfx_kthread+0x10/0x10
  [   82.891959]  ret_from_fork+0x47/0x70
  [   82.891959]  ? __pfx_kthread+0x10/0x10
  [   82.891959]  ret_from_fork_asm+0x1b/0x30
  [   82.891959]  </TASK>
  [   82.891959] Modules linked in: kvm_amd ccp kvm irqbypass input_leds 
psmouse ahci libahci serio_raw overlay 9pnet_virtio virtiofs 9p 9pnet netfs
  [   82.891959] CR2: 0000000000000000
  [   82.891959] ---[ end trace 0000000000000000 ]---
  [   82.891959] RIP: 0010:memcg_hotplug_cpu_dead+0x65/0xc0
  [   82.891959] Code: 44 00 00 48 89 df e8 5a ef ff ff 48 89 c3 41 f7 c5 00 02 
00 00 74 06 fb 0f 1f 44 00 00 4c 89 e7 e8 f0 cd ff ff e8 6b d9 d0 ff <48> 8b 03 
a8 03 75 1e 65 48 ff 08 e8 ab 35 d1 ff 31 c0 5b 41 5c 41
  [   82.891959] RSP: 0018:ffffbd548170bd10 EFLAGS: 00000246
  [   82.891959] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
  [   82.891959] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
  [   82.891959] RBP: ffffbd548170bd28 R08: 0000000000000000 R09: 
0000000000000000
  [   82.891959] R10: 000000000000001c R11: 0000000000000000 R12: 
ffff99183bcb0c00
  [   82.891959] R13: 0000000000000286 R14: 0000000000000001 R15: 
0000000000000000
  [   82.891959] FS:  0000000000000000(0000) GS:ffff99183bc00000(0000) 
knlGS:0000000000000000
  [   82.891959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   82.891959] CR2: 0000000000000000 CR3: 000000001c43c000 CR4: 
00000000000006f0
  [   82.891959] note: kworker/0:2[593] exited with irqs disabled

  [Fix]

  The offending commit relies on a NULL check introduced by an earlier commit 
which we don't have. Pull that in:
  91b71e78b8e4 ("mm: memcg: add NULL check to obj_cgroup_put()")

  [Test Case]

  Running 'sudo chcpu -d 1-95' should not trigger a kernel BUG.

  [Where Problems Could Occur]

  This touches the CPU hotplug code path. Any on- and off-lining of CPUs
  could cause issues.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2121673/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to