On Fri, Nov 02, 2018 at 10:19:15PM +0100, Mark Kettenis wrote:
> I think Ted is pointing out a different issue here that doesn't really
> have anything to do with SMT.  The issue is that in some cases we end
> up with CPUs reporting being 100% busy running the idle thread instead
> of reporting being 100% idle.  This happens quite a lot on machines
> with lots of CPUs immediately after they are booted.  Usually this
> funny state disappears after some time.
> 
> An idle CPU is of course running the idle thread, so in that sense
> this isn't super-strange.  But it does indicate there is some kind of
> accounting issue.  I have a feeling this happens before any processes
> have been scheduled on these CPUs.  But I've never found the problem...

I can easily reproduce this on a 2 socket machine with 4 cores each.
Hyper threads are turned off in BIOS.

schedcpu() does not account the run time if p->p_slptime > 1.  So
fresh idle threads have a CPU percentage of 99%.

USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
root     11006 99.0  0.0     0     0 ??  RK/6  10:19PM    0:15.51 (idle6)
root     26529 99.0  0.0     0     0 ??  RK/7  10:19PM    2:55.45 (idle7)
root     55382 99.0  0.0     0     0 ??  RK/3  10:19PM    2:55.27 (idle3)
root     84574 99.0  0.0     0     0 ??  RK/4  10:19PM    2:54.90 (idle4)
root     53490 99.0  0.0     0     0 ??  RK/5  10:19PM    2:49.72 (idle5)
root     59318 73.2  0.0     0     0 ??  DK    10:19PM    2:53.36 (idle0)
root     24358  0.0  0.0     0     0 ??  RK/1  10:19PM    2:54.66 (idle1)
root     15902  0.0  0.0     0     0 ??  RK/2  10:19PM    2:55.04 (idle2)

This effect goes away after runnning multiple processes on all cores.

USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
root     55382 11.7  0.0     0     0 ??  RK/3  10:19PM    4:06.52 (idle3)
root     84574 11.6  0.0     0     0 ??  DK    10:19PM    4:12.00 (idle4)
root     53490 11.7  0.0     0     0 ??  RK/5  10:19PM    4:11.94 (idle5)
root     11006 11.7  0.0     0     0 ??  RK/6  10:19PM    3:48.90 (idle6)
root     26529  1.3  0.0     0     0 ??  RK/7  10:19PM    4:11.81 (idle7)
root     59318  0.4  0.0     0     0 ??  DK    10:19PM    4:09.78 (idle0)
root     24358  0.0  0.0     0     0 ??  RK/1  10:19PM    4:11.37 (idle1)
root     15902  0.0  0.0     0     0 ??  RK/2  10:19PM    4:11.87 (idle2)

If I initialize p_slptime with 127 the 99% effect does not happen
from the beginning.

USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
root     86374  0.0  0.0     0     0 ??  RK/1  10:18PM    3:10.15 (idle1)
root     78654  0.0  0.0     0     0 ??  DK    10:18PM    3:09.52 (idle2)
root     85088  0.0  0.0     0     0 ??  DK    10:18PM    3:10.82 (idle3)
root     34080  0.0  0.0     0     0 ??  RK/4  10:18PM    2:06.89 (idle4)
root     65391  0.0  0.0     0     0 ??  RK/5  10:18PM    0:20.39 (idle5)
root      6812  0.0  0.0     0     0 ??  RK/6  10:18PM    0:36.33 (idle6)
root      6433  0.0  0.0     0     0 ??  RK/7  10:18PM    0:53.09 (idle7)
root     87244  0.0  0.0     0     0 ??  RK/0  10:18PM    3:07.58 (idle0)

There are still things I don't understand.  After a while the CPU
time for idle5, idle6, idle7 does not increase anymore.  I am doing
iperf3 performance tests on this machine.  My patch makes the results
more unsteady and throughput lower.  It seems that iperf3 processes
get scheduled on CPUs with less memory affinity.

bluhm

Index: kern/kern_sched.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sched.c,v
retrieving revision 1.54
diff -u -p -r1.54 kern_sched.c
--- kern/kern_sched.c   17 Nov 2018 23:10:08 -0000      1.54
+++ kern/kern_sched.c   17 Dec 2018 19:35:33 -0000
@@ -145,6 +145,7 @@ sched_idle(void *v)
         */
        SCHED_LOCK(s);
        cpuset_add(&sched_idle_cpus, ci);
+       p->p_slptime = 127;
        p->p_stat = SSLEEP;
        p->p_cpu = ci;
        atomic_setbits_int(&p->p_flag, P_CPUPEG);

Reply via email to