Hi Prateek,

On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak <[email protected]> 
wrote:
> Hello Srikar,
>
> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
> > L2 Cache reported here is for SMT8 Core aka CACHE domain.
>
> Apart for the scheduler, nothing in tree currently cares about
> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>
> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
> mask in this case.)
>
> Power anyways adds its own topology via set_sched_topology() so the
> default_topology from kernel/sched/topology.c remains unused.
>
> ...
>
> > Shouldnt cache-aware scheduling be worried about cpuset partitions too.
> > If a cpuset has subset of LLC cores, then we should Scheduler assume it can
> > control complete LLC?
>
> Well, the scheduling takes care of partitions and the cache aware
> scheduling bits take care of looking at the full system perspective
> for stats aggregation and pointing to a particular LLc.
>
> We don't compare llc_id across cpusets so we keeping one unique llc_id
> per H/W LLC instance is feasible and it enables us to keep llc_id space
> limited for optimizing cache-aware scheduling.
>
> Now if we have threads of same process across partitions, we'll
> still aggregate the utilization numbers across the full LLC but
> the load balancers at individual partitions will make a call on
> the aggregation.
>
> -- 
> Thanks and Regards,
> Prateek
>
>

I suppose what you suggested looks like below:

powerpc/smp: make cpu_coregroup_mask() return the LLC

On pSeries shared LPARs(or coregroup_enabled is false on
Power9 and earlier) the hemisphere map is not allocated, so
build_sched_domains() dereferences a NULL cpumask and crashes.

The generic scheduler expects cpu_coregroup_mask() to span the LLC.
On powerpc the LLC is the L2. Return cpu_l2_cache_mask() instead of
the hemisphere map. Use a coregroup_map() helper for the in-file
hemisphere users, and a powerpc_tl_mc_mask() wrapper for the MC
sched-domain level.

Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
Reported-by: Venkat Rao Bagalkote <[email protected]>
Suggested-by: K Prateek Nayak <[email protected]>
---
 arch/powerpc/kernel/smp.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1040,11 +1040,22 @@ static const struct cpumask 
*tl_smallcore_smt_mask(struct sched_domain_topology_
 }
 #endif
 
+static inline struct cpumask *coregroup_map(int cpu)
+{
+       return per_cpu(cpu_coregroup_map, cpu);
+}
+
 struct cpumask *cpu_coregroup_mask(int cpu)
 {
-       return per_cpu(cpu_coregroup_map, cpu);
+       return cpu_l2_cache_mask(cpu);
+}
+
+static const struct cpumask *
+powerpc_tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+       return coregroup_map(cpu);
 }
 
 static bool has_coregroup_support(void)
 {
        if (is_shared_processor())
@@ -1155,7 +1166,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
        cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
 
        if (has_coregroup_support())
-               cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
+               cpumask_set_cpu(boot_cpuid, coregroup_map(boot_cpuid));
 
        init_big_cores();
        if (has_big_cores) {
@@ -1520,8 +1531,8 @@ static void remove_cpu_from_masks(int cpu)
                set_cpus_unrelated(cpu, i, cpu_core_mask);
 
        if (has_coregroup_support()) {
-               for_each_cpu(i, cpu_coregroup_mask(cpu))
-                       set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
+               for_each_cpu(i, coregroup_map(cpu))
+                       set_cpus_unrelated(cpu, i, coregroup_map);
        }
 }
 #endif
@@ -1553,7 +1564,7 @@ static void update_coregroup_mask(int cpu, cpumask_var_t 
*mask)
        if (!*mask) {
                /* Assume only siblings are part of this CPU's coregroup */
                for_each_cpu(i, submask_fn(cpu))
-                       set_cpus_related(cpu, i, cpu_coregroup_mask);
+                       set_cpus_related(cpu, i, coregroup_map);
 
                return;
        }
@@ -1561,18 +1572,18 @@ static void update_coregroup_mask(int cpu, 
cpumask_var_t *mask)
        cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
 
        /* Update coregroup mask with all the CPUs that are part of submask */
-       or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
+       or_cpumasks_related(cpu, cpu, submask_fn, coregroup_map);
 
        /* Skip all CPUs already part of coregroup mask */
-       cpumask_andnot(*mask, *mask, cpu_coregroup_mask(cpu));
+       cpumask_andnot(*mask, *mask, coregroup_map(cpu));
 
        for_each_cpu(i, *mask) {
                /* Skip all CPUs not part of this coregroup */
                if (coregroup_id == cpu_to_coregroup_id(i)) {
-                       or_cpumasks_related(cpu, i, submask_fn, 
cpu_coregroup_mask);
+                       or_cpumasks_related(cpu, i, submask_fn, coregroup_map);
                        cpumask_andnot(*mask, *mask, submask_fn(i));
                } else {
-                       cpumask_andnot(*mask, *mask, cpu_coregroup_mask(i));
+                       cpumask_andnot(*mask, *mask, coregroup_map(i));
                }
        }
 }
@@ -1733,7 +1744,7 @@ static void __init build_sched_topology(void)
 
        if (has_coregroup_support()) {
                powerpc_topology[i++] =
-                       SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
+                       SDTL_INIT(powerpc_tl_mc_mask, 
powerpc_shared_proc_flags, MC);
        }
 
        powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, 
powerpc_shared_proc_flags, PKG);
-- 
2.43.0

Thanks,
Yu

Reply via email to