> On 1 Jun 2026, at 2:46 PM, Shrikanth Hegde <[email protected]> wrote:
>
> Hi Venkat. Thanks for the report.
>
> + mukesh, ritesh
>
> On 6/1/26 12:11 PM, Venkat Rao Bagalkote wrote:
>> Greetings!!!
>> I hit a kernel BUG on a linux-next kernel running on ppc64le (Power11 LPAR).
>> The issue was observed once in CI (Avocado tests) and I haven’t been able to
>> reproduce it reliably yet.
>
> Can you run with lockdep and see if you can hit it?
I did run with lockdep, I didn’t hit this issue. Though I will try few more
times, If I hit, I will respond back here.
Meanwhile, I hit another boot warning with lockdep enabled, which is reproted
here [1].
[1]:
https://lore.kernel.org/all/[email protected]/
Regards,
Venkat.
>
>> Architecture: ppc64le (Power11, pSeries)
>> Kernel: 7.1.0-rc5-next-20260529
>> Config: PREEMPT(lazy)
>> CPUs: large system (NR_CPUS=8192)
>
> This is with GENERIC_ENTRY.
>
>> So far, I have not reproduced the crash, but I am trying to stress similar
>> conditions using:
>> parallel read workloads (fio / dd)
>> memory pressure
>> Traces:
>> (5/8) /home/upstreamci/avocado-fvt-wrapper/tests/avocado-misc-tests/
>> cpu/ppc64_cpu_test.py:PPC64Test.test_smt_loop;run-run_type- upstream-9cfe:
>> STARTED
>> [ 1885.176400] crash hp: kexec_trylock() failed, kdump image may be
>> inaccurate
>> [ 1885.296164] crash hp: kexec_trylock() failed, kdump image may be
>> inaccurate
>> [ 1885.386120] crash hp: kexec_trylock() failed, kdump image may be
>> inaccurate
>> [ 1885.556134] crash hp: kexec_trylock() failed, kdump image may be
>> inaccurate
>> [ 1886.576119] crash hp: kexec_trylock() failed, kdump image may be
>> inaccurate
>> [ 1886.806060] crash hp: kexec_trylock() failed, kdump image may be
>> inaccurate
>> [ 1887.026051] crash hp: kexec_trylock() failed, kdump image may be
>> inaccurate
>> [ 1887.456075] ------------[ cut here ]------------
>> [ 1887.456101] kernel BUG at kernel/sched/core.c:7512!
>> [ 1887.456107] Oops: Exception in kernel mode, sig: 5 [#1]
>> [ 1887.456111] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
>> [ 1887.456116] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
>> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
>> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding tls
>> ip_set rfkill nf_tables fsdev_dax kmem device_dax pseries_rng vmx_crypto
>> dax_pmem fuse ext4 crc16 mbcache jbd2 sd_mod nd_pmem papr_scm sg libnvdimm
>> ibmvscsi ibmveth scsi_transport_srp pseries_wdt
>> [ 1887.456173] CPU: 28 UID: 0 PID: 85305 Comm: kexec Not tainted 7.1.0-
>> rc5-next-20260529 #1 PREEMPT(lazy)
>> [ 1887.456180] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200
>> 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
>> [ 1887.456185] NIP: c0000000013a8e8c LR: c0000000003483bc CTR:
>> 0000000000000000
>> [ 1887.456190] REGS: c000000069f03070 TRAP: 0700 Not tainted (7.1.0-
>> rc5-next-20260529)
>> [ 1887.456195] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24428222
>> XER: 0000005a
>> [ 1887.456208] CFAR: c0000000003483b8 IRQMASK: 0
>> [ 1887.456208] GPR00: c0000000003483bc c000000069f03330 c000000001a82100
>> c000000069f033e0
>> [ 1887.456208] GPR04: 0000000000000000 0000000000000001 0000000000000001
>> c000000006dd3b00
>> [ 1887.456208] GPR08: ffffffffffffff00 0000000000000001 0000000000000000
>> 0000000024428220
>> [ 1887.456208] GPR12: 0000000000000300 c000000effdbef00 0000000000000000
>> 0000000000000000
>> [ 1887.456208] GPR16: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [ 1887.456208] GPR20: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [ 1887.456208] GPR24: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [ 1887.456208] GPR28: 0000000000000000 0000000000000000 0000000000000000
>> c000000069f033e0
>> [ 1887.456265] NIP [c0000000013a8e8c] preempt_schedule_irq+0x44/0x118
>> [ 1887.456274] LR [c0000000003483bc]
>> dynamic_irqentry_exit_cond_resched+0x40/0x1a4
>> [ 1887.456282] Call Trace:
>> [ 1887.456284] [c000000069f03360] [c0000000003483bc]
>> dynamic_irqentry_exit_cond_resched+0x40/0x1a4
>> [ 1887.456291] [c000000069f03380] [c00000000014f3bc] do_page_fault+0xc0/0x104
>> [ 1887.456298] [c000000069f033b0] [c000000000008be0]
>> data_access_common_virt+0x210/0x220
>> [ 1887.456306] ---- interrupt: 300 at __copy_tofrom_user_base+0xac/0x5a4
>> [ 1887.456313] NIP: c00000000017fc38 LR: c000000000aaa684 CTR:
>> 0000000000000000
>> [ 1887.456317] REGS: c000000069f033e0 TRAP: 0300 Not tainted (7.1.0-
>> rc5-next-20260529)
>> [ 1887.456322] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR:
>> 24428220 XER: 2004005a
>> [ 1887.456334] CFAR: c00000000017fc34 DAR: 00003fff879a8000 DSISR: 42000000
>> IRQMASK: 0
>> [ 1887.456334] GPR00: 0000000000000000 c000000069f036a0 c000000001a82100
>> 00003fff879a8000
>> [ 1887.456334] GPR04: c0000000bb314ff0 0000000000001000 69f0000606480600
>> 0200c4080368f028
>> [ 1887.456334] GPR08: 09036af00005d9c4 0600000200e80803 0000000000000000
>> 0000000000000030
>> [ 1887.456334] GPR12: 0000000000000040 c000000effdbef00 0000000000000000
>> 000000000000000e
>> [ 1887.456334] GPR16: 0000000004a00000 000000000000001f c000000069f038a0
>> c00000006e73e500
>> [ 1887.456334] GPR20: c00000006f0ff6a8 0000000000000000 c00000006f0ff540
>> 0000000000000001
>> [ 1887.456334] GPR24: 000000001816ce60 c0000000bb314000 c000000002e48730
>> c000000069f03a30
>> [ 1887.456334] GPR28: c0000000bb314000 00003fff879a7010 0000000000000010
>> 0000000000001000
>> [ 1887.456393] NIP [c00000000017fc38] __copy_tofrom_user_base+0xac/0x5a4
>> [ 1887.456399] LR [c000000000aaa684] raw_copy_to_user+0x12c/0x314
>> [ 1887.456405] ---- interrupt: 300
>> [ 1887.456408] [c000000069f036a0] [c000000000aaa5f4]
>> raw_copy_to_user+0x9c/0x314 (unreliable)
>> [ 1887.456416] [c000000069f036e0] [c000000000aacd08] _copy_to_iter+0xe4/0x79c
>> [ 1887.456423] [c000000069f037a0] [c000000000ab01ec]
>> copy_page_to_iter+0xd4/0x1a4
>> [ 1887.456429] [c000000069f037f0] [c0000000005ddc34] filemap_read+0x420/0x4f0
>> [ 1887.456436] [c000000069f039c0] [c0080000043443e0]
>> ext4_file_read_iter+0x78/0x31c [ext4]
>> [ 1887.456517] [c000000069f03a10] [c000000000796498] vfs_read+0x2a8/0x3c8
>> [ 1887.456524] [c000000069f03ac0] [c00000000079726c] ksys_read+0x88/0x140
>> [ 1887.456530] [c000000069f03b10] [c000000000032f98]
>> system_call_exception+0x198/0x4e0
>> [ 1887.456537] [c000000069f03e30] [c00000000000d05c]
>> system_call_vectored_common+0x15c/0x2ec
>> [ 1887.456544] ---- interrupt: 3000 at 0x3fff9b133cf4
>> [ 1887.456549] NIP: 00003fff9b133cf4 LR: 00003fff9b133cf4 CTR:
>> 0000000000000000
>> [ 1887.456554] REGS: c000000069f03e60 TRAP: 3000 Not tainted (7.1.0-
>> rc5-next-20260529)
>> [ 1887.456558] MSR: 800000000000f033 <SF,EE,PR,FP,ME,IR,DR,RI,LE> CR:
>> 44424402 XER: 00000000
>> [ 1887.456572] IRQMASK: 0
>> [ 1887.456572] GPR00: 0000000000000003 00003fffe5fb4190 0000000105087f00
>> 0000000000000003
>> [ 1887.456572] GPR04: 00003fff82e93010 000000001816ce60 0000000000000022
>> 0000000000000000
>> [ 1887.456572] GPR08: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> [ 1887.456572] GPR12: 0000000000000000 00003fff9b4cd860 000000010507f588
>> 0000000000000000
>> [ 1887.456572] GPR16: ffffffffffffffff 0000000000000000 0000000000000006
>> 0000000000000000
>> [ 1887.456572] GPR20: 0000000000000001 00003fff9b23039c 00003fff9b2303a0
>> 00003fffe5fb5ee7
>> [ 1887.456572] GPR24: 0000000000000000 0000000000000000 00003fffe5fb5ee7
>> 00003fffe5fb42d0
>> [ 1887.456572] GPR28: 0000000000000003 00003fff82e93010 000000001816ce60
>> 0000000000000000
>> [ 1887.456626] NIP [00003fff9b133cf4] 0x3fff9b133cf4
>> [ 1887.456630] LR [00003fff9b133cf4] 0x3fff9b133cf4
>> [ 1887.456634] ---- interrupt: 3000
>> [ 1887.456637] Code: fbe1fff8 e92d0128 f8010010 f821ffd1 81490000 39200001
>> 2c0a0000 40820014 892d0152 552907fe 7d290034 5529d97e <0b090000> 60000000
>> 3bc00000 ebed0128
>> [ 1887.456657] ---[ end trace 0000000000000000 ]---
>> If you happen to fix this, please add below tag.
>> Reported-by: Venkat Rao Bagalkote <[email protected]>
>
> Ritesh, Mukesh, Is below possible scenario?
>
> do_page_fault seems to enable irq's in the interrupt handler?
> is that expected? if so, one might see
>
> -- do_page_fault (enter kernel mode)
> -- enables interrupts
> -- gets interrupt - Sets need_resched.
> -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
> and calls preempt_schedule_irq, which catches both
> preempt_count and !irqs_disabled. Hence the panic?
>
> Should do_page_fault do preempt_disable when it enables the interrupts?