> Do you have a way to turn off one of the sockets on "c" (2 x E5540) and get
> the numbers with HT (8 processors) and without HT (4 processors)? It would
> also be interesting to see "c" with HT turned off.
here's the progression
4 4.41u 1.83s 4.06r 0. %ilock
8 4.47u 2.37s 3.60r 2.0
12 4.49u 8.34s 4.40r 11.0
16 4.36u 13.16s 4.43r 14.7
here's a fun little calculation:
16 threads * 4.43 s * 0.147 + 1.83s baseline
= 10.41936 + 1.83 thread*s
= 12.25s
it seems that increased ilock contention is a big factor
in the increase in system time.
ilock accounting has most (>80%) long-held ilocks
(>8.5µs, ~21k cycles) starting here /sys/src/libc/port/pool.c:1318.
this is no surprise. technically, a long-held ilock is not
really a problem—until somebody else wants it. but we
can be fairly certain that allocb/malloc is a fairly contended code
path.
hopefully i'll be able to test a less-contended replacement for
allocb/freeb before i run out of time with this machine.
> Certainly it seems to me that idlehands needs to be fixed,
> your bit array "active.schedwait" is one way.
i'm not convinced that idlehands is anything but a power-waster.
performance wise, it's nearly ideal.
- erik