On 26/05/26 9:38 am, Chen, Yu C wrote:
Hi Venkat,
On 5/26/2026 11:14 AM, Srikar Dronamraju wrote:
* Chen, Yu C <[email protected]> [2026-05-25 23:35:45]:
Hi Venkat,
On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
Greetings!!!
I am seeing an early boot kernel panic due to NULL pointer dereference
on a POWER9 (pSeries) system when testing linux-next (next-20260522).
This issue is seen on P11 as well.
[ 0.006697] smp: Brought up 1 node, 16 CPUs
[ 0.006702] Big cores detected but using small core scheduling
[ 0.006752] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 0.006755] Faulting instruction address: 0xc000000020adbb6c
[ 0.006759] Oops: Kernel access of bad area, sig: 7 [#1]
[ 0.006762] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[ 0.006767] Modules linked in:
[ 0.006772] CPU: 4 UID: 0 PID: 1 Comm: swapper/4 Not tainted
7.1.0-rc5-next-20260525 #1 PREEMPT(lazy)
[ 0.006777] Hardware name: IBM,9080-HEX Power11 (architected)
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[ 0.006781] NIP: c000000020adbb6c LR: c0000000202e5a58 CTR:
0000000000000000
[ 0.006784] REGS: c0000000283d7890 TRAP: 0300 Not tainted
(7.1.0-rc5-next-20260525)
[ 0.006788] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR:
44002242 XER: 20040003
[ 0.006796] CFAR: c0000000202e5a54 DAR: 0000000000000000 DSISR:
00080000 IRQMASK: 0
[ 0.006796] GPR00: 0000000000000000 c0000000283d7b50 c000000021abf100
0000000000000010
[ 0.006796] GPR04: 0000000000000010 0000000000000030 0000000000000000
c000000028365500
[ 0.006796] GPR08: 0000000000000000 c000000022213598 000000003b77d000
0000000000000000
[ 0.006796] GPR12: c00000002005d8f0 c000000000008000 c0000000283cb578
c0000000283cb400
[ 0.006796] GPR16: c0000000283c9000 c000000022218b20 c0000000222330e8
00000000ffffffff
[ 0.006796] GPR20: fffffffffffffff6 0000000000000000 c000000022da36e0
0000000000000000
[ 0.006796] GPR24: 0000000000000000 0000000000000000 c0000000283c9178
c0000000227b5f00
[ 0.006796] GPR28: c00000002831c1e8 c000000022db5980 0000000000000000
0000000000000000
[ 0.006835] NIP [c000000020adbb6c] _find_first_bit+0xc/0xc0
[ 0.006842] LR [c0000000202e5a58] build_sched_domains+0x7d8/0xb40
[ 0.006847] Call Trace:
[ 0.006849] [c0000000283d7b50] [c0000000202e5408]
build_sched_domains+0x188/0xb40 (unreliable)
[ 0.006854] [c0000000283d7c90] [c000000022034380]
sched_init_domains+0x118/0x168
[ 0.006860] [c0000000283d7ce0] [c000000022032b14]
sched_init_smp+0xa8/0x158
[ 0.006865] [c0000000283d7d30] [c000000022005674]
kernel_init_freeable+0x1ac/0x294
[ 0.006870] [c0000000283d7dd0] [c000000020011718] kernel_init+0x2c/0x1c4
[ 0.006874] [c0000000283d7e30] [c00000002000debc]
ret_from_kernel_user_thread+0x14/0x1c
[ 0.006878] ---- interrupt: 0 at 0x0
[ 0.006881] Code: eb610038 7fc3f378 eb810040 eba10048 38210060
ebc1fff0 ebe1fff8 7c0803a6 4e800020 7c681b78 7c832379 4d820020
<e9280000> 38e3ffff 39400000 78e7d7e2
[ 0.006895] ---[ end trace 0000000000000000 ]---
[ 0.006898]
Regards,
Venkat.
It seems that cpumask_first(llc_mask(i)) is accessing
NULL cpu_coregroup_mask():
has_coregroup_support() is false, thus cpu_coregroup_map
is never allocated in smp_prepare_cpus().
This machine is a "shared system" VM. We should probably
let the LLC id generation fall back to using L2 id if
cpu_coregroup_mask is unavailable (which restores the
behavior before this patch). I'm wondering if the following
change would help(need IBM friends' help on this):
Power9 and below systems, dont have coregroup.
Its not because of shared LPAR. But its true for dedicated LPARs too.
Only Power10 and above systems have hemisphere where we add MC/coregroup
support.
OK, thanks for the correction. Are you saying coregroup_enabled is false
on Power9 and older hardware, and set to true on Power10? Power10 has a
corresponding device-tree property, which is parsed to enable hemisphere
support in find_possible_nodes(). This is why has_coregroup_support()
returns true for Power10.
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3467f86fd78f..cf6c2e4190ab 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1042,11 +1042,6 @@ static const struct cpumask
*tl_smallcore_smt_mask(struct sched_domain_topology_
}
#endif
-struct cpumask *cpu_coregroup_mask(int cpu)
-{
- return per_cpu(cpu_coregroup_map, cpu);
-}
-
static bool has_coregroup_support(void)
{
/* Coregroup identification not available on shared systems */
@@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
return coregroup_enabled;
}
+struct cpumask *cpu_coregroup_mask(int cpu)
+{
+ if (!has_coregroup_support())
+ return cpu_l2_cache_mask(cpu);
+
+ return per_cpu(cpu_coregroup_map, cpu);
+}
+
While this is a work-around for the problem in Power9
It will hurt Power10 and Power11 systems.
As has been alluded by Prateek, MC is not LLC on Power.
Could you please elaborate on the cache topology?
Specifically, could you clarify what the LLC is for Power9
and Power10 respectively? Is it always the L2 cache?
I have checked the IBM documentation available at:
https://hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_IBM_Starke_POWER10_v33.pdf
According to the document, a hemisphere corresponds to a 64MB
L3 cache shared by 8 cores. Since the MC domain spans a single
hemisphere, I wonder why the SD_SHARE_LLC flag is not enabled
for the MC domain?
So by using llc_mask as cpu_coregroup_mask() we run the trouble of
assuming
MC to be similar to LLC. So it will impact Power 10/11 Systems.
In commit b5ea300a17e3 sched/cache: Make LLC id continuous, we define
#define llc_mask(cpu) cpu_coregroup_mask(cpu)
defining it llc_mask to cpu_coregroup_mask means MC should be LLC.
This is not true for some architectures atleast on Power.
OK.
So shouldn't it be using
#define llc_mask(cpu) per_cpu(sd_llc, cpu)
This should work for systems where LLC is sub-coregroup, coregroup
(or super
coregroup: Lets say some archs want LLC at PKG and cluster at
coregroup).
if we do that, I dont think we even need the else case where we say
#define llc_mask(cpu) cpumask_of(cpu)
I suppose you are referring to
sched_domain_span(per_cpu(sd_llc, cpu)).
Indeed, deriving the LLC from the SD_SHARE_LLC level offers
better scalability. However, this approach would involve scheduler
domains, which can be truncated by cpuset partitions - a scenario we
prefer to avoid.
thanks,
Chenyu