On 26/08/25 12:13, Peter Zijlstra wrote: > Subject: sched/fair: Get rid of sched_domains_curr_level hack for > tl->cpumask() > From: Peter Zijlstra <pet...@infradead.org> > Date: Mon, 25 Aug 2025 12:02:44 +0000 > > Leon [1] and Vinicius [2] noted a topology_span_sane() warning during > their testing starting from v6.16-rc1. Debug that followed pointed to > the tl->mask() for the NODE domain being incorrectly resolved to that of > the highest NUMA domain. > > tl->mask() for NODE is set to the sd_numa_mask() which depends on the > global "sched_domains_curr_level" hack. "sched_domains_curr_level" is > set to the "tl->numa_level" during tl traversal in build_sched_domains() > calling sd_init() but was not reset before topology_span_sane(). > > Since "tl->numa_level" still reflected the old value from > build_sched_domains(), topology_span_sane() for the NODE domain trips > when the span of the last NUMA domain overlaps. > > Instead of replicating the "sched_domains_curr_level" hack, get rid of > it entirely and instead, pass the entire "sched_domain_topology_level" > object to tl->cpumask() function to prevent such mishap in the future. > > sd_numa_mask() now directly references "tl->numa_level" instead of > relying on the global "sched_domains_curr_level" hack to index into > sched_domains_numa_masks[]. >
Eh, of course I see this *after* looking at the v6 patch. I tested this again for good measure, but given I only test this under x86 and the changes with v6 are in s390/ppc, I didn't expect to see much change :-) Reviewed-by: Valentin Schneider <vschn...@redhat.com> Tested-by: Valentin Schneider <vschn...@redhat.com>