On 05/11/25 7:15 pm, Christophe Leroy wrote:
Le 23/10/2025 à 06:54, Venkat Rao Bagalkote a écrit :
Greetings!!!
IBM CI has reported a kernel crash while running mce selftests on
mainline kernel, from tools/testing/selftests/powerpc/mce/.
This issue is hit when CONFIG_KASAN is enabled. If its disabled, test
passes.
Traces:
[ 8041.225432] BUG: Unable to handle kernel data access on read at
0xc00e0001a1ad6103
[ 8041.225453] Faulting instruction address: 0xc0000000008c54d8
[ 8041.225461] Oops: Kernel access of bad area, sig: 11 [#1]
[ 8041.225467] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[ 8041.225475] Modules linked in: nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack bonding tls
nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink
pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 nd_pmem
papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth scsi_transport_srp
pseries_wdt
[ 8041.225558] CPU: 17 UID: 0 PID: 877869 Comm: inject-ra-err Kdump:
loaded Not tainted 6.18.0-rc2+ #1 VOLUNTARY
[ 8041.225569] Hardware name: IBM,9080-HEX Power11 (architected)
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[ 8041.225576] NIP: c0000000008c54d8 LR: c00000000004e464 CTR:
0000000000000000
[ 8041.225583] REGS: c0000000fff778d0 TRAP: 0300 Not tainted
(6.18.0- rc2+)
[ 8041.225590] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 48002828
XER: 00000000
[ 8041.225607] CFAR: c00000000004e460 DAR: c00e0001a1ad6103 DSISR:
40000000 IRQMASK: 3
[ 8041.225607] GPR00: c0000000019d0598 c0000000fff77b70
c00000000244a400 c000000d0d6b0818
[ 8041.225607] GPR04: 0000000000004d43 0000000000000008
c00000000004e464 004d424900000000
[ 8041.225607] GPR08: 0000000000000001 18000001a1ad6103
a80e000000000000 0000000003000048
[ 8041.225607] GPR12: 0000000000000000 c000000d0ddf3300
0000000000000000 0000000000000000
[ 8041.225607] GPR16: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
[ 8041.225607] GPR20: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
[ 8041.225607] GPR24: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
[ 8041.225607] GPR28: c000000d0d6b0888 c000000d0d6b0800
0000000000004d43 c000000d0d6b0818
[ 8041.225701] NIP [c0000000008c54d8] __asan_load2+0x54/0xd8
[ 8041.225712] LR [c00000000004e464] pseries_errorlog_id+0x20/0x3c
[ 8041.225722] Call Trace:
[ 8041.225726] [c0000000fff77b90] [c0000000001f8748]
fwnmi_get_errinfo+0xd4/0x104
[ 8041.225738] [c0000000fff77bc0] [c0000000019d0598]
get_pseries_errorlog+0xa8/0x110
[ 8041.225750] [c0000000fff77c00] [c0000000001f8f68]
pseries_machine_check_realmode+0x11c/0x214
[ 8041.225762] [c0000000fff77ce0] [c000000000049ca4]
machine_check_early+0x74/0xc0
[ 8041.225771] [c0000000fff77d30] [c0000000000084a4]
machine_check_early_common+0x1b4/0x2c0
Is it a new problem or has it always been there ?
Its not a new problem. I have enabled KASAN recently in the config, and
then I started seeing this issue.
I have tested on 6.17, 6.16 and 6.15 kernels and issues is there all along.
Regards,
Venkat.
The problem is because KASAN is not compatible with realmode (MMU
translation is OFF).
pseries_machine_check_realmode() is located in
arch/powerpc/platforms/pseries/ras.c built with KASAN_SANITIZE_ras.o := n
But pseries_machine_check_realmode() calls mce_handle_error() which
calls get_pseries_errorlog().
get_pseries_errorlog() is in arch/powerpc/kernel/rtas.c which is _not_
built with KASAN_SANITIZE disabled hence the Oops.
Unrelated, but it looks like there is also a problem with commit
cc15ff327569 ("powerpc/mce: Avoid using irq_work_queue() in
realmode"), which removed the re-enabling of translation but left the
call to mce_handle_err_virtmode().
Christophe