On Fri, Dec 11, 2020 at 06:44:42PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 10, 2020 at 12:58:33PM +0000, Mel Gorman wrote:
> > The prequisite patch to make that approach work was rejected though
> > as on its own, it's not very helpful and Vincent didn't like that the
> > load_balance_mask was abused to make it effective.
> 
> So last time I poked at all this, I found that using more masks was a
> performance issue as well due to added cache misses.
> 
> Anyway, I finally found some time to look at all this, and while reading
> over the whole SiS crud to refresh the brain came up with the below...
> 
> It's still naf, but at least it gets rid of a bunch of duplicate
> scanning and LoC decreases, so it should be awesome. Ofcourse, as
> always, benchmarks will totally ruin everything, we'll see, I started
> some.
> 
> It goes on top of Mel's first two patches (which is about as far as I
> got)...
> 

It's not free of bugs but it's interesting! The positive aspects are that
it does a single scan, inits select_idle_mask once and caches an idle
candidate when searching for an idle core but in a far cleaner fashion
than what I did.

One bug is in __select_idle_core() though. It's scanning the SMT mask,
not select_idle_mask so it can return an idle candidate that is not in
p->cpus_ptr.

I queued tests anyway to see what it looks like.

There are some other potential caveats.

This is a single pass so when test_idle_cores() is true, __select_idle_core
is used to to check all the siblings even if the core is not idle. That
could have been cut short if __select_idle_core checked *idle_cpu ==
1 and terminated the SMT scan if an idle candidate had already been found.

Second downside is related. If test_idle_cpus() was true but no idle
CPU is found then __select_idle_core has been called enough to scan
the entire domain. In this corner case, the new code does *more* work
because the old code would have failed select_idle_core() quickly and
then select_idle_cpu() would be throttled by SIS_PROP. I think this will
only be noticable in the heavily overloaded case but if the corner case
hits enough then the new code will be slower than the old code for the
over-saturated case (i.e. hackbench with lots of groups).

The third potential downside is that the SMT sibling is not guaranteed to
be checked due to SIS_PROP throttling but in the old code, that would have
been checked by select_idle_smt(). That might result in premature stacking
of runnable tasks on the same CPU. Similarly, as __select_idle_core may
find multiple idle candidates, it will not pick the targets SMT sibling
if it is idle like select_idle_smt would have.

That said, I am skeptical that select_idle_smt() matters all that often.
If select_idle_cpu() has been throttled heavily then the machine is
either over-saturated or was over-saturated in the very recent pass. In
that situation, the chances of the SMT siblings being free is slim.

Each of the downsides could potentially addressed with this untested
mess

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 637f610c7059..260592d143d8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6061,7 +6061,7 @@ void __update_idle_core(struct rq *rq)
        rcu_read_unlock();
 }
 
-static int __select_idle_core(int core, struct cpumask *cpus, int *idle_cpu)
+static int __select_idle_core(struct task_struct *p, int core, struct cpumask 
*cpus, int *idle_cpu)
 {
        bool idle = true;
        int cpu;
@@ -6070,9 +6070,11 @@ static int __select_idle_core(int core, struct cpumask 
*cpus, int *idle_cpu)
                schedstat_inc(this_rq()->sis_scanned);
                if (!available_idle_cpu(cpu)) {
                        idle = false;
-                       continue;
+                       if (*idle_cpu == -1)
+                               continue;
+                       break;
                }
-               if (idle_cpu)
+               if (*idle_cpu == -1 && cpumask_test_cpu(cpu, p->cpus_ptr))
                        *idle_cpu = cpu;
        }
 
@@ -6094,7 +6096,7 @@ static inline bool test_idle_cores(int cpu, bool def)
        return def;
 }
 
-static inline int __select_idle_core(int core, struct cpumask *cpus, int 
*idle_cpu)
+static inline int __select_idle_core(struct task_struct *p, int core, struct 
cpumask *cpus, int *idle_cpu)
 {
        return -1;
 }
@@ -6142,7 +6144,7 @@ static int select_idle_cpu(struct task_struct *p, struct 
sched_domain *sd, int t
 
        for_each_cpu_wrap(cpu, cpus, target) {
                if (smt) {
-                       i = __select_idle_core(cpu, cpus, &idle_cpu);
+                       i = __select_idle_core(p, cpu, cpus, &idle_cpu);
                        if ((unsigned)i < nr_cpumask_bits)
                                return i;
 

-- 
Mel Gorman
SUSE Labs

Reply via email to