On 8/29/25 1:23 PM, Valentin Schneider wrote:
On 26/08/25 12:13, Peter Zijlstra wrote:
Subject: sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 25 Aug 2025 12:02:44 +0000

Leon [1] and Vinicius [2] noted a topology_span_sane() warning during
their testing starting from v6.16-rc1. Debug that followed pointed to
the tl->mask() for the NODE domain being incorrectly resolved to that of
the highest NUMA domain.

tl->mask() for NODE is set to the sd_numa_mask() which depends on the
global "sched_domains_curr_level" hack. "sched_domains_curr_level" is
set to the "tl->numa_level" during tl traversal in build_sched_domains()
calling sd_init() but was not reset before topology_span_sane().

Since "tl->numa_level" still reflected the old value from
build_sched_domains(), topology_span_sane() for the NODE domain trips
when the span of the last NUMA domain overlaps.

Instead of replicating the "sched_domains_curr_level" hack, get rid of
it entirely and instead, pass the entire "sched_domain_topology_level"
object to tl->cpumask() function to prevent such mishap in the future.

sd_numa_mask() now directly references "tl->numa_level" instead of
relying on the global "sched_domains_curr_level" hack to index into
sched_domains_numa_masks[].


Eh, of course I see this *after* looking at the v6 patch.

I tested this again for good measure, but given I only test this under
x86 and the changes with v6 are in s390/ppc, I didn't expect to see much
change :-)

Reviewed-by: Valentin Schneider <vschn...@redhat.com>
Tested-by: Valentin Schneider <vschn...@redhat.com>


I was looking at: 
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core

Current code doesn't allow one to enable/disable SCHED_MC on ppc since it is 
set always in kconfig.
Used the below patch:

I think since the config is there, it would be good to provide a option to 
disable. no?

---

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fc0d1c19f5a1..da5b2f8d3686 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -170,9 +170,8 @@ config PPC
        select ARCH_STACKWALK
        select ARCH_SUPPORTS_ATOMIC_RMW
        select ARCH_SUPPORTS_DEBUG_PAGEALLOC    if PPC_BOOK3S || PPC_8xx
-       select ARCH_SUPPORTS_SCHED_SMT          if PPC64 && SMP
        select ARCH_SUPPORTS_SCHED_MC           if PPC64 && SMP
-       select SCHED_MC                         if ARCH_SUPPORTS_SCHED_MC
+       select ARCH_SUPPORTS_SCHED_SMT          if PPC64 && SMP
        select ARCH_USE_BUILTIN_BSWAP
        select ARCH_USE_CMPXCHG_LOCKREF         if PPC64
        select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 68edb66c2964..458ec5bd859e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1706,10 +1706,12 @@ static void __init build_sched_topology(void)
                        SDTL_INIT(tl_cache_mask, powerpc_shared_cache_flags, 
CACHE);
        }
+#ifdef CONFIG_SCHED_MC
        if (has_coregroup_support()) {
                powerpc_topology[i++] =
                        SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
        }
+#endif
powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);

Reply via email to