Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

Scott Cheloha Thu, 03 Aug 2023 07:53:21 -0700

On Thu, Aug 03, 2023 at 02:38:11PM +0200, Mark Kettenis wrote:
> > Date: Thu, 3 Aug 2023 12:56:01 +0200
> > From: Claudio Jeker <cje...@diehard.n-r-g.com>
> > 
> > On Thu, Aug 03, 2023 at 10:53:24AM +0200, Claudio Jeker wrote:
> > > On Thu, Aug 03, 2023 at 10:13:57AM +0200, Martin Pieuchot wrote:
> > > > On 02/08/23(Wed) 14:22, Claudio Jeker wrote:
> > > > > On Mon, Jul 31, 2023 at 10:21:11AM -0500, Scott Cheloha wrote:
> > > > > > On Fri, Jul 28, 2023 at 07:36:41PM -0500, Scott Cheloha wrote:
> > > > > > > claudio@ notes that uvm_loadav() pointlessly walks the allproc 
> > > > > > > list to
> > > > > > > recompute schedstate_percpu.spn_nrun for each CPU.
> > > > > > > 
> > > > > > > We can just use the value instead of recomputing it.
> > > > > > 
> > > > > > Whoops, off-by-one.  The current load averaging code includes the
> > > > > > running thread in the nrun count if it is *not* the idle thread.
> > > > > 
> > > > > Yes, with this the loadavg seems to be consistent and following the 
> > > > > number
> > > > > of running processes. The code seems to behave like before (with all 
> > > > > its
> > > > > quirks).
> > > > > 
> > > > > OK claudio@, this is a good first step. Now I think this code should 
> > > > > later
> > > > > be moved into kern_sched.c or sched_bsd.c and removed from uvm. Not 
> > > > > sure why
> > > > > the load calculation is part of memory management...
> > > > > 
> > > > > On top of this I wonder about the per-CPU load calculation. In my 
> > > > > opinion
> > > > > it is wrong to skip the calculation if the CPU is idle. Because of 
> > > > > this
> > > > > there is no decay for idle CPUs and that feels wrong to me.
> > > > > Do we have a userland utility that reports spc_ldavg?
> > > > 
> > > > I don't understand why the SCHED_LOCK() is needed.  Since I'm really
> > > > against adding new uses for it, could you comment on that?
> > > 
> > > The question is how sloppy do we want to be. This code looks at
> > > ci_schedstate (spc_idleproc and spc_nrun) and ci_curproc so the be correct
> > > this needs to lock the scheduler. Do we really want that, hell no.
> >   
> > How about this. Kill the spc_ldavg calculation. Its use is more then
> > questionable. The cpu selection code using this is not wroking well and
> > process stealing will do the rest.


This is more or less what I said yesterday.  The per-CPU load average
is not useful for deciding where to put a thread.  Of the variables we
have available to consider, only the current length of the runqueue is
useful.

Go for it, kill it.

One nit below.

> > Also use sched_cpu_idle to know if a cpu is idle.

(This is a neat trick, nice.)

> > Index: kern/kern_sched.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_sched.c,v
> > retrieving revision 1.81
> > diff -u -p -r1.81 kern_sched.c
> > --- kern/kern_sched.c       27 Jul 2023 17:52:53 -0000      1.81
> > +++ kern/kern_sched.c       3 Aug 2023 08:41:38 -0000
> > @@ -373,7 +373,6 @@ sched_choosecpu_fork(struct proc *parent
> >  {
> >  #ifdef MULTIPROCESSOR
> >     struct cpu_info *choice = NULL;
> > -   fixpt_t load, best_load = ~0;
> >     int run, best_run = INT_MAX;
> >     struct cpu_info *ci;
> >     struct cpuset set;
> > @@ -407,13 +406,10 @@ sched_choosecpu_fork(struct proc *parent
> >     while ((ci = cpuset_first(&set)) != NULL) {
> >             cpuset_del(&set, ci);
> >  
> > -           load = ci->ci_schedstate.spc_ldavg;
> >             run = ci->ci_schedstate.spc_nrun;
> >  
> > -           if (choice == NULL || run < best_run ||
> > -               (run == best_run &&load < best_load)) {
> > +           if (choice == NULL || run < best_run) {
> >                     choice = ci;
> > -                   best_load = load;
> >                     best_run = run;
> >             }
> >     }
> > @@ -605,11 +601,6 @@ sched_proc_to_cpu_cost(struct cpu_info *
> >      */
> >     if (CPU_IS_PRIMARY(ci))
> >             cost += sched_cost_runnable;
> > -
> > -   /*
> > -    * Higher load on the destination means we don't want to go there.
> > -    */
> > -   cost += ((sched_cost_load * spc->spc_ldavg) >> FSHIFT);
> >  
> >     /*
> >      * If the proc is on this cpu already, lower the cost by how much
> > Index: sys/sched.h
> > ===================================================================
> > RCS file: /cvs/src/sys/sys/sched.h,v
> > retrieving revision 1.58
> > diff -u -p -r1.58 sched.h
> > --- sys/sched.h     25 Jul 2023 18:16:19 -0000      1.58
> > +++ sys/sched.h     3 Aug 2023 08:42:39 -0000
> > @@ -110,7 +110,6 @@ struct schedstate_percpu {
> >     struct clockintr *spc_profclock; /* [o] profclock handle */
> >  
> >     u_int spc_nrun;                 /* procs on the run queues */
> > -   fixpt_t spc_ldavg;              /* shortest load avg. for this cpu */
> >  
> >     volatile uint32_t spc_whichqs;
> >     volatile u_int spc_spinning;    /* this cpu is currently spinning */
> > Index: uvm/uvm_meter.c
> > ===================================================================
> > RCS file: /cvs/src/sys/uvm/uvm_meter.c,v
> > retrieving revision 1.46
> > diff -u -p -r1.46 uvm_meter.c
> > --- uvm/uvm_meter.c 2 Aug 2023 13:54:45 -0000       1.46
> > +++ uvm/uvm_meter.c 3 Aug 2023 10:12:02 -0000
> > @@ -70,7 +70,7 @@ struct loadavg averunnable;
> >   * 5 second intervals.
> >   */
> >  
> > -static fixpt_t cexp[3] = {
> > +static const fixpt_t cexp[3] = {
> >     0.9200444146293232 * FSCALE,    /* exp(-1/12) */
> >     0.9834714538216174 * FSCALE,    /* exp(-1/60) */
> >     0.9944598480048967 * FSCALE,    /* exp(-1/180) */

Yeah, this was staring me in the face, too.  Please do it in a
separate patch, though.

Otherwise, ok cheloha@.

Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

Reply via email to