fair: Fix select_idle_cpu()s cost accounting

Mel Gorman Sat, 09 Jan 2021 06:02:13 -0800

On Fri, Jan 08, 2021 at 09:21:48PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 08, 2021 at 10:27:38AM +0000, Mel Gorman wrote:
> 
> > 1. avg_scan_cost is now based on the average scan cost of a rq but
> >    avg_idle is still scaled to the domain size. This is a bit problematic
> >    because it's comparing scan cost of a single rq with the estimated
> >    average idle time of a domain. As a result, the scan depth can be much
> >    larger than it was before the patch and led to some regressions.
> 
> > @@ -6164,25 +6164,25 @@ static int select_idle_cpu(struct task_struct *p, 
> > struct sched_domain *sd, int t
> >              */
> >             avg_idle = this_rq()->avg_idle / 512;
> >             avg_cost = this_sd->avg_scan_cost + 1;
> > -
> > -           span_avg = sd->span_weight * avg_idle;
> > -           if (span_avg > 4*avg_cost)
> > -                   nr = div_u64(span_avg, avg_cost);
> > -           else
> > +           nr = div_u64(avg_idle, avg_cost);
> > +           if (nr < 4)
> >                     nr = 4;
> 
> Oooh, could it be I simply didn't remember how that code was supposed to
> work and should kick my (much) younger self for not writing a comment?
> 
> Consider:
> 
>        span_weight * avg_idle               avg_cost
>   nr = ---------------------- = avg_idle / ----------
>                avg_cost                    span_weigt
> 
> Where: avg_cost / span_weight ~= cost-per-rq
>


This would definitely make sense and I even evaluated it but the nature
of avg_idle and the scale it works at (up to 2*sched_migration_cost)
just ended up generating lunatic values far outside the size of the domain
size. Fitting that to the domain size just ended up looking silly too and
avg_cost does not decay. Still, in principle, it's the right direction,
it's just not what the code does right now.

-- 
Mel Gorman
SUSE Labs

Re: [RFC][PATCH 1/5] sched/fair: Fix select_idle_cpu()s cost accounting

Reply via email to