Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-08-03 Thread Theo de Raadt
Scott Cheloha  wrote:

> On Thu, Aug 03, 2023 at 09:09:30AM -0600, Theo de Raadt wrote:
> > Scott Cheloha  wrote:
> > 
> > > > > How about this. Kill the spc_ldavg calculation. Its use is more then
> > > > > questionable. The cpu selection code using this is not wroking well 
> > > > > and
> > > > > process stealing will do the rest.
> > > 
> > > This is more or less what I said yesterday.  The per-CPU load average
> > > is not useful for deciding where to put a thread.
> > 
> > I guess you have not been reading mpi's scheduler diff.  The entire idea
> > of "placing a thread" is 1980's single-processor flawed.
> 
> Dude, I'm not talking about mpi's patch, I'm talking about what's in
> the tree.
> 
> Given the current state of the scheduler, we can throw out spc_ldavg.
> It isn't necessary.
> 
> > > Of the variables we
> > > have available to consider, only the current length of the runqueue is
> > > useful.
> > 
> > No, that concept is also broken.
> 
> Again, I am talking about the current scheduler.
> 
> Said another way: the current approach can limp along just fine using
> only the runqueue length.  We can get rid of spc_ldavg without
> worrying about it.

I'm just saying all of us need to recognize that it is impossible to
defend the in-tree code.

Anways, you said "the current length of the runqueue is useful".  It is
only useful in choosing a different stupid runqueue.






Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-08-03 Thread Scott Cheloha
On Thu, Aug 03, 2023 at 09:09:30AM -0600, Theo de Raadt wrote:
> Scott Cheloha  wrote:
> 
> > > > How about this. Kill the spc_ldavg calculation. Its use is more then
> > > > questionable. The cpu selection code using this is not wroking well and
> > > > process stealing will do the rest.
> > 
> > This is more or less what I said yesterday.  The per-CPU load average
> > is not useful for deciding where to put a thread.
> 
> I guess you have not been reading mpi's scheduler diff.  The entire idea
> of "placing a thread" is 1980's single-processor flawed.

Dude, I'm not talking about mpi's patch, I'm talking about what's in
the tree.

Given the current state of the scheduler, we can throw out spc_ldavg.
It isn't necessary.

> > Of the variables we
> > have available to consider, only the current length of the runqueue is
> > useful.
> 
> No, that concept is also broken.

Again, I am talking about the current scheduler.

Said another way: the current approach can limp along just fine using
only the runqueue length.  We can get rid of spc_ldavg without
worrying about it.



Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-08-03 Thread Theo de Raadt
Scott Cheloha  wrote:

> > > How about this. Kill the spc_ldavg calculation. Its use is more then
> > > questionable. The cpu selection code using this is not wroking well and
> > > process stealing will do the rest.
> 
> This is more or less what I said yesterday.  The per-CPU load average
> is not useful for deciding where to put a thread.

I guess you have not been reading mpi's scheduler diff.  The entire idea
of "placing a thread" is 1980's single-processor flawed.

> Of the variables we
> have available to consider, only the current length of the runqueue is
> useful.

No, that concept is also broken.

On your 8-cpu laptop, the runqueue does not work at all.  Typically, the
number of available cpu's exceeds the ready-to-run processes.  For
workloads where the ready process count exceeds the cpus, the processes
get put onto the wrong cpu's queues -- and because scheduler code runs
so rarely, this is all a waste of time.

What actually happens is pretty much all process selection happen based
upon a process on a cpu going to sleep, and that cpu finds it's runqueue
is empty because other cpu's have stolen it empty, so that cpu proceeds
to steal out of another cpu's runqueue.  All process progress really
depends upon stealing processes, fixing the other cpu's runqueue with locks,
and thus ignoring any pre-calculation by the 'scheduler code'.

All of this stealing requires big locks, to protect the scheduler code
which is occasionally (let's be honest -- rarely?)  re-organizing these
stupid runqueues, which then get ignored in the typical case.  Those
locks are so crazy weird, we've been confused for decades about how to
improve it.  It appears there are no small steps, and we probably need
a "Briandead / Dead Alive" lawnmower procedure, and then rebuild afterwards.

I think you will soon join the club of people who believe this code from
1980 is so completely unsuitable that not one line of it can be reused.



Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-08-03 Thread Scott Cheloha
On Thu, Aug 03, 2023 at 02:38:11PM +0200, Mark Kettenis wrote:
> > Date: Thu, 3 Aug 2023 12:56:01 +0200
> > From: Claudio Jeker 
> > 
> > On Thu, Aug 03, 2023 at 10:53:24AM +0200, Claudio Jeker wrote:
> > > On Thu, Aug 03, 2023 at 10:13:57AM +0200, Martin Pieuchot wrote:
> > > > On 02/08/23(Wed) 14:22, Claudio Jeker wrote:
> > > > > On Mon, Jul 31, 2023 at 10:21:11AM -0500, Scott Cheloha wrote:
> > > > > > On Fri, Jul 28, 2023 at 07:36:41PM -0500, Scott Cheloha wrote:
> > > > > > > claudio@ notes that uvm_loadav() pointlessly walks the allproc 
> > > > > > > list to
> > > > > > > recompute schedstate_percpu.spn_nrun for each CPU.
> > > > > > > 
> > > > > > > We can just use the value instead of recomputing it.
> > > > > > 
> > > > > > Whoops, off-by-one.  The current load averaging code includes the
> > > > > > running thread in the nrun count if it is *not* the idle thread.
> > > > > 
> > > > > Yes, with this the loadavg seems to be consistent and following the 
> > > > > number
> > > > > of running processes. The code seems to behave like before (with all 
> > > > > its
> > > > > quirks).
> > > > > 
> > > > > OK claudio@, this is a good first step. Now I think this code should 
> > > > > later
> > > > > be moved into kern_sched.c or sched_bsd.c and removed from uvm. Not 
> > > > > sure why
> > > > > the load calculation is part of memory management...
> > > > > 
> > > > > On top of this I wonder about the per-CPU load calculation. In my 
> > > > > opinion
> > > > > it is wrong to skip the calculation if the CPU is idle. Because of 
> > > > > this
> > > > > there is no decay for idle CPUs and that feels wrong to me.
> > > > > Do we have a userland utility that reports spc_ldavg?
> > > > 
> > > > I don't understand why the SCHED_LOCK() is needed.  Since I'm really
> > > > against adding new uses for it, could you comment on that?
> > > 
> > > The question is how sloppy do we want to be. This code looks at
> > > ci_schedstate (spc_idleproc and spc_nrun) and ci_curproc so the be correct
> > > this needs to lock the scheduler. Do we really want that, hell no.
> >   
> > How about this. Kill the spc_ldavg calculation. Its use is more then
> > questionable. The cpu selection code using this is not wroking well and
> > process stealing will do the rest.

This is more or less what I said yesterday.  The per-CPU load average
is not useful for deciding where to put a thread.  Of the variables we
have available to consider, only the current length of the runqueue is
useful.

Go for it, kill it.

One nit below.

> > Also use sched_cpu_idle to know if a cpu is idle.

(This is a neat trick, nice.)

> > Index: kern/kern_sched.c
> > ===
> > RCS file: /cvs/src/sys/kern/kern_sched.c,v
> > retrieving revision 1.81
> > diff -u -p -r1.81 kern_sched.c
> > --- kern/kern_sched.c   27 Jul 2023 17:52:53 -  1.81
> > +++ kern/kern_sched.c   3 Aug 2023 08:41:38 -
> > @@ -373,7 +373,6 @@ sched_choosecpu_fork(struct proc *parent
> >  {
> >  #ifdef MULTIPROCESSOR
> > struct cpu_info *choice = NULL;
> > -   fixpt_t load, best_load = ~0;
> > int run, best_run = INT_MAX;
> > struct cpu_info *ci;
> > struct cpuset set;
> > @@ -407,13 +406,10 @@ sched_choosecpu_fork(struct proc *parent
> > while ((ci = cpuset_first()) != NULL) {
> > cpuset_del(, ci);
> >  
> > -   load = ci->ci_schedstate.spc_ldavg;
> > run = ci->ci_schedstate.spc_nrun;
> >  
> > -   if (choice == NULL || run < best_run ||
> > -   (run == best_run & < best_load)) {
> > +   if (choice == NULL || run < best_run) {
> > choice = ci;
> > -   best_load = load;
> > best_run = run;
> > }
> > }
> > @@ -605,11 +601,6 @@ sched_proc_to_cpu_cost(struct cpu_info *
> >  */
> > if (CPU_IS_PRIMARY(ci))
> > cost += sched_cost_runnable;
> > -
> > -   /*
> > -* Higher load on the destination means we don't want to go there.
> > -*/
> > -   cost += ((sched_cost_load * spc->spc_ldavg) >> FSHIFT);
> >  
> > /*
> >  * If the proc is on this cpu already, lower the cost by how much
> > Index: sys/sched.h
> > ===
> > RCS file: /cvs/src/sys/sys/sched.h,v
> > retrieving revision 1.58
> > diff -u -p -r1.58 sched.h
> > --- sys/sched.h 25 Jul 2023 18:16:19 -  1.58
> > +++ sys/sched.h 3 Aug 2023 08:42:39 -
> > @@ -110,7 +110,6 @@ struct schedstate_percpu {
> > struct clockintr *spc_profclock; /* [o] profclock handle */
> >  
> > u_int spc_nrun; /* procs on the run queues */
> > -   fixpt_t spc_ldavg;  /* shortest load avg. for this cpu */
> >  
> > volatile uint32_t spc_whichqs;
> > volatile u_int spc_spinning;/* this cpu is currently spinning */
> > Index: uvm/uvm_meter.c
> > 

Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-08-03 Thread Mark Kettenis
> Date: Thu, 3 Aug 2023 12:56:01 +0200
> From: Claudio Jeker 
> 
> On Thu, Aug 03, 2023 at 10:53:24AM +0200, Claudio Jeker wrote:
> > On Thu, Aug 03, 2023 at 10:13:57AM +0200, Martin Pieuchot wrote:
> > > On 02/08/23(Wed) 14:22, Claudio Jeker wrote:
> > > > On Mon, Jul 31, 2023 at 10:21:11AM -0500, Scott Cheloha wrote:
> > > > > On Fri, Jul 28, 2023 at 07:36:41PM -0500, Scott Cheloha wrote:
> > > > > > claudio@ notes that uvm_loadav() pointlessly walks the allproc list 
> > > > > > to
> > > > > > recompute schedstate_percpu.spn_nrun for each CPU.
> > > > > > 
> > > > > > We can just use the value instead of recomputing it.
> > > > > 
> > > > > Whoops, off-by-one.  The current load averaging code includes the
> > > > > running thread in the nrun count if it is *not* the idle thread.
> > > > 
> > > > Yes, with this the loadavg seems to be consistent and following the 
> > > > number
> > > > of running processes. The code seems to behave like before (with all its
> > > > quirks).
> > > > 
> > > > OK claudio@, this is a good first step. Now I think this code should 
> > > > later
> > > > be moved into kern_sched.c or sched_bsd.c and removed from uvm. Not 
> > > > sure why
> > > > the load calculation is part of memory management...
> > > > 
> > > > On top of this I wonder about the per-CPU load calculation. In my 
> > > > opinion
> > > > it is wrong to skip the calculation if the CPU is idle. Because of this
> > > > there is no decay for idle CPUs and that feels wrong to me.
> > > > Do we have a userland utility that reports spc_ldavg?
> > > 
> > > I don't understand why the SCHED_LOCK() is needed.  Since I'm really
> > > against adding new uses for it, could you comment on that?
> > 
> > The question is how sloppy do we want to be. This code looks at
> > ci_schedstate (spc_idleproc and spc_nrun) and ci_curproc so the be correct
> > this needs to lock the scheduler. Do we really want that, hell no.
>   
> How about this. Kill the spc_ldavg calculation. Its use is more then
> questionable. The cpu selection code using this is not wroking well and
> process stealing will do the rest.
> Also use sched_cpu_idle to know if a cpu is idle.

Given the direction we're heading, this makes sense.  I don't believe
spc_loadavg is doing anything useful at the moment.

ok kettenis@

> Index: kern/kern_sched.c
> ===
> RCS file: /cvs/src/sys/kern/kern_sched.c,v
> retrieving revision 1.81
> diff -u -p -r1.81 kern_sched.c
> --- kern/kern_sched.c 27 Jul 2023 17:52:53 -  1.81
> +++ kern/kern_sched.c 3 Aug 2023 08:41:38 -
> @@ -373,7 +373,6 @@ sched_choosecpu_fork(struct proc *parent
>  {
>  #ifdef MULTIPROCESSOR
>   struct cpu_info *choice = NULL;
> - fixpt_t load, best_load = ~0;
>   int run, best_run = INT_MAX;
>   struct cpu_info *ci;
>   struct cpuset set;
> @@ -407,13 +406,10 @@ sched_choosecpu_fork(struct proc *parent
>   while ((ci = cpuset_first()) != NULL) {
>   cpuset_del(, ci);
>  
> - load = ci->ci_schedstate.spc_ldavg;
>   run = ci->ci_schedstate.spc_nrun;
>  
> - if (choice == NULL || run < best_run ||
> - (run == best_run & < best_load)) {
> + if (choice == NULL || run < best_run) {
>   choice = ci;
> - best_load = load;
>   best_run = run;
>   }
>   }
> @@ -605,11 +601,6 @@ sched_proc_to_cpu_cost(struct cpu_info *
>*/
>   if (CPU_IS_PRIMARY(ci))
>   cost += sched_cost_runnable;
> -
> - /*
> -  * Higher load on the destination means we don't want to go there.
> -  */
> - cost += ((sched_cost_load * spc->spc_ldavg) >> FSHIFT);
>  
>   /*
>* If the proc is on this cpu already, lower the cost by how much
> Index: sys/sched.h
> ===
> RCS file: /cvs/src/sys/sys/sched.h,v
> retrieving revision 1.58
> diff -u -p -r1.58 sched.h
> --- sys/sched.h   25 Jul 2023 18:16:19 -  1.58
> +++ sys/sched.h   3 Aug 2023 08:42:39 -
> @@ -110,7 +110,6 @@ struct schedstate_percpu {
>   struct clockintr *spc_profclock; /* [o] profclock handle */
>  
>   u_int spc_nrun; /* procs on the run queues */
> - fixpt_t spc_ldavg;  /* shortest load avg. for this cpu */
>  
>   volatile uint32_t spc_whichqs;
>   volatile u_int spc_spinning;/* this cpu is currently spinning */
> Index: uvm/uvm_meter.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_meter.c,v
> retrieving revision 1.46
> diff -u -p -r1.46 uvm_meter.c
> --- uvm/uvm_meter.c   2 Aug 2023 13:54:45 -   1.46
> +++ uvm/uvm_meter.c   3 Aug 2023 10:12:02 -
> @@ -70,7 +70,7 @@ struct loadavg averunnable;
>   * 5 second intervals.
>   */
>  
> -static fixpt_t cexp[3] = {
> +static const fixpt_t 

Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-08-03 Thread Claudio Jeker
On Thu, Aug 03, 2023 at 10:53:24AM +0200, Claudio Jeker wrote:
> On Thu, Aug 03, 2023 at 10:13:57AM +0200, Martin Pieuchot wrote:
> > On 02/08/23(Wed) 14:22, Claudio Jeker wrote:
> > > On Mon, Jul 31, 2023 at 10:21:11AM -0500, Scott Cheloha wrote:
> > > > On Fri, Jul 28, 2023 at 07:36:41PM -0500, Scott Cheloha wrote:
> > > > > claudio@ notes that uvm_loadav() pointlessly walks the allproc list to
> > > > > recompute schedstate_percpu.spn_nrun for each CPU.
> > > > > 
> > > > > We can just use the value instead of recomputing it.
> > > > 
> > > > Whoops, off-by-one.  The current load averaging code includes the
> > > > running thread in the nrun count if it is *not* the idle thread.
> > > 
> > > Yes, with this the loadavg seems to be consistent and following the number
> > > of running processes. The code seems to behave like before (with all its
> > > quirks).
> > > 
> > > OK claudio@, this is a good first step. Now I think this code should later
> > > be moved into kern_sched.c or sched_bsd.c and removed from uvm. Not sure 
> > > why
> > > the load calculation is part of memory management...
> > > 
> > > On top of this I wonder about the per-CPU load calculation. In my opinion
> > > it is wrong to skip the calculation if the CPU is idle. Because of this
> > > there is no decay for idle CPUs and that feels wrong to me.
> > > Do we have a userland utility that reports spc_ldavg?
> > 
> > I don't understand why the SCHED_LOCK() is needed.  Since I'm really
> > against adding new uses for it, could you comment on that?
> 
> The question is how sloppy do we want to be. This code looks at
> ci_schedstate (spc_idleproc and spc_nrun) and ci_curproc so the be correct
> this needs to lock the scheduler. Do we really want that, hell no.
  
How about this. Kill the spc_ldavg calculation. Its use is more then
questionable. The cpu selection code using this is not wroking well and
process stealing will do the rest.
Also use sched_cpu_idle to know if a cpu is idle.

-- 
:wq Claudio

Index: kern/kern_sched.c
===
RCS file: /cvs/src/sys/kern/kern_sched.c,v
retrieving revision 1.81
diff -u -p -r1.81 kern_sched.c
--- kern/kern_sched.c   27 Jul 2023 17:52:53 -  1.81
+++ kern/kern_sched.c   3 Aug 2023 08:41:38 -
@@ -373,7 +373,6 @@ sched_choosecpu_fork(struct proc *parent
 {
 #ifdef MULTIPROCESSOR
struct cpu_info *choice = NULL;
-   fixpt_t load, best_load = ~0;
int run, best_run = INT_MAX;
struct cpu_info *ci;
struct cpuset set;
@@ -407,13 +406,10 @@ sched_choosecpu_fork(struct proc *parent
while ((ci = cpuset_first()) != NULL) {
cpuset_del(, ci);
 
-   load = ci->ci_schedstate.spc_ldavg;
run = ci->ci_schedstate.spc_nrun;
 
-   if (choice == NULL || run < best_run ||
-   (run == best_run & < best_load)) {
+   if (choice == NULL || run < best_run) {
choice = ci;
-   best_load = load;
best_run = run;
}
}
@@ -605,11 +601,6 @@ sched_proc_to_cpu_cost(struct cpu_info *
 */
if (CPU_IS_PRIMARY(ci))
cost += sched_cost_runnable;
-
-   /*
-* Higher load on the destination means we don't want to go there.
-*/
-   cost += ((sched_cost_load * spc->spc_ldavg) >> FSHIFT);
 
/*
 * If the proc is on this cpu already, lower the cost by how much
Index: sys/sched.h
===
RCS file: /cvs/src/sys/sys/sched.h,v
retrieving revision 1.58
diff -u -p -r1.58 sched.h
--- sys/sched.h 25 Jul 2023 18:16:19 -  1.58
+++ sys/sched.h 3 Aug 2023 08:42:39 -
@@ -110,7 +110,6 @@ struct schedstate_percpu {
struct clockintr *spc_profclock; /* [o] profclock handle */
 
u_int spc_nrun; /* procs on the run queues */
-   fixpt_t spc_ldavg;  /* shortest load avg. for this cpu */
 
volatile uint32_t spc_whichqs;
volatile u_int spc_spinning;/* this cpu is currently spinning */
Index: uvm/uvm_meter.c
===
RCS file: /cvs/src/sys/uvm/uvm_meter.c,v
retrieving revision 1.46
diff -u -p -r1.46 uvm_meter.c
--- uvm/uvm_meter.c 2 Aug 2023 13:54:45 -   1.46
+++ uvm/uvm_meter.c 3 Aug 2023 10:12:02 -
@@ -70,7 +70,7 @@ struct loadavg averunnable;
  * 5 second intervals.
  */
 
-static fixpt_t cexp[3] = {
+static const fixpt_t cexp[3] = {
0.9200444146293232 * FSCALE,/* exp(-1/12) */
0.9834714538216174 * FSCALE,/* exp(-1/60) */
0.9944598480048967 * FSCALE,/* exp(-1/180) */
@@ -98,27 +98,16 @@ uvm_meter(void)
 static void
 uvm_loadav(struct loadavg *avg)
 {
+   extern struct cpuset sched_idle_cpus;
CPU_INFO_ITERATOR cii;
struct cpu_info *ci;
-   

Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-08-03 Thread Claudio Jeker
On Thu, Aug 03, 2023 at 10:13:57AM +0200, Martin Pieuchot wrote:
> On 02/08/23(Wed) 14:22, Claudio Jeker wrote:
> > On Mon, Jul 31, 2023 at 10:21:11AM -0500, Scott Cheloha wrote:
> > > On Fri, Jul 28, 2023 at 07:36:41PM -0500, Scott Cheloha wrote:
> > > > claudio@ notes that uvm_loadav() pointlessly walks the allproc list to
> > > > recompute schedstate_percpu.spn_nrun for each CPU.
> > > > 
> > > > We can just use the value instead of recomputing it.
> > > 
> > > Whoops, off-by-one.  The current load averaging code includes the
> > > running thread in the nrun count if it is *not* the idle thread.
> > 
> > Yes, with this the loadavg seems to be consistent and following the number
> > of running processes. The code seems to behave like before (with all its
> > quirks).
> > 
> > OK claudio@, this is a good first step. Now I think this code should later
> > be moved into kern_sched.c or sched_bsd.c and removed from uvm. Not sure why
> > the load calculation is part of memory management...
> > 
> > On top of this I wonder about the per-CPU load calculation. In my opinion
> > it is wrong to skip the calculation if the CPU is idle. Because of this
> > there is no decay for idle CPUs and that feels wrong to me.
> > Do we have a userland utility that reports spc_ldavg?
> 
> I don't understand why the SCHED_LOCK() is needed.  Since I'm really
> against adding new uses for it, could you comment on that?

The question is how sloppy do we want to be. This code looks at
ci_schedstate (spc_idleproc and spc_nrun) and ci_curproc so the be correct
this needs to lock the scheduler. Do we really want that, hell no.
 
> > > Index: uvm_meter.c
> > > ===
> > > RCS file: /cvs/src/sys/uvm/uvm_meter.c,v
> > > retrieving revision 1.44
> > > diff -u -p -r1.44 uvm_meter.c
> > > --- uvm_meter.c   21 Jun 2023 21:16:21 -  1.44
> > > +++ uvm_meter.c   31 Jul 2023 15:20:37 -
> > > @@ -102,43 +102,29 @@ uvm_loadav(struct loadavg *avg)
> > >  {
> > >   CPU_INFO_ITERATOR cii;
> > >   struct cpu_info *ci;
> > > - int i, nrun;
> > > - struct proc *p;
> > > - int nrun_cpu[MAXCPUS];
> > > + struct schedstate_percpu *spc;
> > > + u_int i, nrun = 0, nrun_cpu;
> > > + int s;
> > >  
> > > - nrun = 0;
> > > - memset(nrun_cpu, 0, sizeof(nrun_cpu));
> > >  
> > > - LIST_FOREACH(p, , p_list) {
> > > - switch (p->p_stat) {
> > > - case SSTOP:
> > > - case SSLEEP:
> > > - break;
> > > - case SRUN:
> > > - case SONPROC:
> > > - if (p == p->p_cpu->ci_schedstate.spc_idleproc)
> > > - continue;
> > > - /* FALLTHROUGH */
> > > - case SIDL:
> > > - nrun++;
> > > - if (p->p_cpu)
> > > - nrun_cpu[CPU_INFO_UNIT(p->p_cpu)]++;
> > > - }
> > > + SCHED_LOCK(s);
> > > + CPU_INFO_FOREACH(cii, ci) {
> > > + spc = >ci_schedstate;
> > > + nrun_cpu = spc->spc_nrun;
> > > + if (ci->ci_curproc != spc->spc_idleproc)
> > > + nrun_cpu++;
> > > + if (nrun_cpu == 0)
> > > + continue;
> > > + spc->spc_ldavg = (cexp[0] * spc->spc_ldavg +
> > > + nrun_cpu * FSCALE *
> > > + (FSCALE - cexp[0])) >> FSHIFT;
> > > + nrun += nrun_cpu;
> > >   }
> > > + SCHED_UNLOCK(s);
> > >  
> > >   for (i = 0; i < 3; i++) {
> > >   avg->ldavg[i] = (cexp[i] * avg->ldavg[i] +
> > >   nrun * FSCALE * (FSCALE - cexp[i])) >> FSHIFT;
> > > - }
> > > -
> > > - CPU_INFO_FOREACH(cii, ci) {
> > > - struct schedstate_percpu *spc = >ci_schedstate;
> > > -
> > > - if (nrun_cpu[CPU_INFO_UNIT(ci)] == 0)
> > > - continue;
> > > - spc->spc_ldavg = (cexp[0] * spc->spc_ldavg +
> > > - nrun_cpu[CPU_INFO_UNIT(ci)] * FSCALE *
> > > - (FSCALE - cexp[0])) >> FSHIFT;
> > >   }
> > >  }
> > >  
> > 
> > -- 
> > :wq Claudio
> > 
> 

-- 
:wq Claudio



Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-08-03 Thread Martin Pieuchot
On 02/08/23(Wed) 14:22, Claudio Jeker wrote:
> On Mon, Jul 31, 2023 at 10:21:11AM -0500, Scott Cheloha wrote:
> > On Fri, Jul 28, 2023 at 07:36:41PM -0500, Scott Cheloha wrote:
> > > claudio@ notes that uvm_loadav() pointlessly walks the allproc list to
> > > recompute schedstate_percpu.spn_nrun for each CPU.
> > > 
> > > We can just use the value instead of recomputing it.
> > 
> > Whoops, off-by-one.  The current load averaging code includes the
> > running thread in the nrun count if it is *not* the idle thread.
> 
> Yes, with this the loadavg seems to be consistent and following the number
> of running processes. The code seems to behave like before (with all its
> quirks).
> 
> OK claudio@, this is a good first step. Now I think this code should later
> be moved into kern_sched.c or sched_bsd.c and removed from uvm. Not sure why
> the load calculation is part of memory management...
> 
> On top of this I wonder about the per-CPU load calculation. In my opinion
> it is wrong to skip the calculation if the CPU is idle. Because of this
> there is no decay for idle CPUs and that feels wrong to me.
> Do we have a userland utility that reports spc_ldavg?

I don't understand why the SCHED_LOCK() is needed.  Since I'm really
against adding new uses for it, could you comment on that?

> > Index: uvm_meter.c
> > ===
> > RCS file: /cvs/src/sys/uvm/uvm_meter.c,v
> > retrieving revision 1.44
> > diff -u -p -r1.44 uvm_meter.c
> > --- uvm_meter.c 21 Jun 2023 21:16:21 -  1.44
> > +++ uvm_meter.c 31 Jul 2023 15:20:37 -
> > @@ -102,43 +102,29 @@ uvm_loadav(struct loadavg *avg)
> >  {
> > CPU_INFO_ITERATOR cii;
> > struct cpu_info *ci;
> > -   int i, nrun;
> > -   struct proc *p;
> > -   int nrun_cpu[MAXCPUS];
> > +   struct schedstate_percpu *spc;
> > +   u_int i, nrun = 0, nrun_cpu;
> > +   int s;
> >  
> > -   nrun = 0;
> > -   memset(nrun_cpu, 0, sizeof(nrun_cpu));
> >  
> > -   LIST_FOREACH(p, , p_list) {
> > -   switch (p->p_stat) {
> > -   case SSTOP:
> > -   case SSLEEP:
> > -   break;
> > -   case SRUN:
> > -   case SONPROC:
> > -   if (p == p->p_cpu->ci_schedstate.spc_idleproc)
> > -   continue;
> > -   /* FALLTHROUGH */
> > -   case SIDL:
> > -   nrun++;
> > -   if (p->p_cpu)
> > -   nrun_cpu[CPU_INFO_UNIT(p->p_cpu)]++;
> > -   }
> > +   SCHED_LOCK(s);
> > +   CPU_INFO_FOREACH(cii, ci) {
> > +   spc = >ci_schedstate;
> > +   nrun_cpu = spc->spc_nrun;
> > +   if (ci->ci_curproc != spc->spc_idleproc)
> > +   nrun_cpu++;
> > +   if (nrun_cpu == 0)
> > +   continue;
> > +   spc->spc_ldavg = (cexp[0] * spc->spc_ldavg +
> > +   nrun_cpu * FSCALE *
> > +   (FSCALE - cexp[0])) >> FSHIFT;
> > +   nrun += nrun_cpu;
> > }
> > +   SCHED_UNLOCK(s);
> >  
> > for (i = 0; i < 3; i++) {
> > avg->ldavg[i] = (cexp[i] * avg->ldavg[i] +
> > nrun * FSCALE * (FSCALE - cexp[i])) >> FSHIFT;
> > -   }
> > -
> > -   CPU_INFO_FOREACH(cii, ci) {
> > -   struct schedstate_percpu *spc = >ci_schedstate;
> > -
> > -   if (nrun_cpu[CPU_INFO_UNIT(ci)] == 0)
> > -   continue;
> > -   spc->spc_ldavg = (cexp[0] * spc->spc_ldavg +
> > -   nrun_cpu[CPU_INFO_UNIT(ci)] * FSCALE *
> > -   (FSCALE - cexp[0])) >> FSHIFT;
> > }
> >  }
> >  
> 
> -- 
> :wq Claudio
> 



Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-08-02 Thread Claudio Jeker
On Mon, Jul 31, 2023 at 10:21:11AM -0500, Scott Cheloha wrote:
> On Fri, Jul 28, 2023 at 07:36:41PM -0500, Scott Cheloha wrote:
> > claudio@ notes that uvm_loadav() pointlessly walks the allproc list to
> > recompute schedstate_percpu.spn_nrun for each CPU.
> > 
> > We can just use the value instead of recomputing it.
> 
> Whoops, off-by-one.  The current load averaging code includes the
> running thread in the nrun count if it is *not* the idle thread.

Yes, with this the loadavg seems to be consistent and following the number
of running processes. The code seems to behave like before (with all its
quirks).

OK claudio@, this is a good first step. Now I think this code should later
be moved into kern_sched.c or sched_bsd.c and removed from uvm. Not sure why
the load calculation is part of memory management...

On top of this I wonder about the per-CPU load calculation. In my opinion
it is wrong to skip the calculation if the CPU is idle. Because of this
there is no decay for idle CPUs and that feels wrong to me.
Do we have a userland utility that reports spc_ldavg?

> Index: uvm_meter.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_meter.c,v
> retrieving revision 1.44
> diff -u -p -r1.44 uvm_meter.c
> --- uvm_meter.c   21 Jun 2023 21:16:21 -  1.44
> +++ uvm_meter.c   31 Jul 2023 15:20:37 -
> @@ -102,43 +102,29 @@ uvm_loadav(struct loadavg *avg)
>  {
>   CPU_INFO_ITERATOR cii;
>   struct cpu_info *ci;
> - int i, nrun;
> - struct proc *p;
> - int nrun_cpu[MAXCPUS];
> + struct schedstate_percpu *spc;
> + u_int i, nrun = 0, nrun_cpu;
> + int s;
>  
> - nrun = 0;
> - memset(nrun_cpu, 0, sizeof(nrun_cpu));
>  
> - LIST_FOREACH(p, , p_list) {
> - switch (p->p_stat) {
> - case SSTOP:
> - case SSLEEP:
> - break;
> - case SRUN:
> - case SONPROC:
> - if (p == p->p_cpu->ci_schedstate.spc_idleproc)
> - continue;
> - /* FALLTHROUGH */
> - case SIDL:
> - nrun++;
> - if (p->p_cpu)
> - nrun_cpu[CPU_INFO_UNIT(p->p_cpu)]++;
> - }
> + SCHED_LOCK(s);
> + CPU_INFO_FOREACH(cii, ci) {
> + spc = >ci_schedstate;
> + nrun_cpu = spc->spc_nrun;
> + if (ci->ci_curproc != spc->spc_idleproc)
> + nrun_cpu++;
> + if (nrun_cpu == 0)
> + continue;
> + spc->spc_ldavg = (cexp[0] * spc->spc_ldavg +
> + nrun_cpu * FSCALE *
> + (FSCALE - cexp[0])) >> FSHIFT;
> + nrun += nrun_cpu;
>   }
> + SCHED_UNLOCK(s);
>  
>   for (i = 0; i < 3; i++) {
>   avg->ldavg[i] = (cexp[i] * avg->ldavg[i] +
>   nrun * FSCALE * (FSCALE - cexp[i])) >> FSHIFT;
> - }
> -
> - CPU_INFO_FOREACH(cii, ci) {
> - struct schedstate_percpu *spc = >ci_schedstate;
> -
> - if (nrun_cpu[CPU_INFO_UNIT(ci)] == 0)
> - continue;
> - spc->spc_ldavg = (cexp[0] * spc->spc_ldavg +
> - nrun_cpu[CPU_INFO_UNIT(ci)] * FSCALE *
> - (FSCALE - cexp[0])) >> FSHIFT;
>   }
>  }
>  

-- 
:wq Claudio



Re: uvm_loadav: don't recompute schedstate_percpu.spc_nrun

2023-07-31 Thread Scott Cheloha
On Fri, Jul 28, 2023 at 07:36:41PM -0500, Scott Cheloha wrote:
> claudio@ notes that uvm_loadav() pointlessly walks the allproc list to
> recompute schedstate_percpu.spn_nrun for each CPU.
> 
> We can just use the value instead of recomputing it.

Whoops, off-by-one.  The current load averaging code includes the
running thread in the nrun count if it is *not* the idle thread.

Index: uvm_meter.c
===
RCS file: /cvs/src/sys/uvm/uvm_meter.c,v
retrieving revision 1.44
diff -u -p -r1.44 uvm_meter.c
--- uvm_meter.c 21 Jun 2023 21:16:21 -  1.44
+++ uvm_meter.c 31 Jul 2023 15:20:37 -
@@ -102,43 +102,29 @@ uvm_loadav(struct loadavg *avg)
 {
CPU_INFO_ITERATOR cii;
struct cpu_info *ci;
-   int i, nrun;
-   struct proc *p;
-   int nrun_cpu[MAXCPUS];
+   struct schedstate_percpu *spc;
+   u_int i, nrun = 0, nrun_cpu;
+   int s;
 
-   nrun = 0;
-   memset(nrun_cpu, 0, sizeof(nrun_cpu));
 
-   LIST_FOREACH(p, , p_list) {
-   switch (p->p_stat) {
-   case SSTOP:
-   case SSLEEP:
-   break;
-   case SRUN:
-   case SONPROC:
-   if (p == p->p_cpu->ci_schedstate.spc_idleproc)
-   continue;
-   /* FALLTHROUGH */
-   case SIDL:
-   nrun++;
-   if (p->p_cpu)
-   nrun_cpu[CPU_INFO_UNIT(p->p_cpu)]++;
-   }
+   SCHED_LOCK(s);
+   CPU_INFO_FOREACH(cii, ci) {
+   spc = >ci_schedstate;
+   nrun_cpu = spc->spc_nrun;
+   if (ci->ci_curproc != spc->spc_idleproc)
+   nrun_cpu++;
+   if (nrun_cpu == 0)
+   continue;
+   spc->spc_ldavg = (cexp[0] * spc->spc_ldavg +
+   nrun_cpu * FSCALE *
+   (FSCALE - cexp[0])) >> FSHIFT;
+   nrun += nrun_cpu;
}
+   SCHED_UNLOCK(s);
 
for (i = 0; i < 3; i++) {
avg->ldavg[i] = (cexp[i] * avg->ldavg[i] +
nrun * FSCALE * (FSCALE - cexp[i])) >> FSHIFT;
-   }
-
-   CPU_INFO_FOREACH(cii, ci) {
-   struct schedstate_percpu *spc = >ci_schedstate;
-
-   if (nrun_cpu[CPU_INFO_UNIT(ci)] == 0)
-   continue;
-   spc->spc_ldavg = (cexp[0] * spc->spc_ldavg +
-   nrun_cpu[CPU_INFO_UNIT(ci)] * FSCALE *
-   (FSCALE - cexp[0])) >> FSHIFT;
}
 }