> In a crude test on a 1.5GHz p4 willamette with a local fossil/venti and
> 256mb of ram, 'time mk 'CONF=pcf' > /dev/null' in /sys/src/9/pc, on a
> fully-built source tree, adding the PAUSE reduced times from an average of
> 18.97s to 18.84s (across ten runs).

we tried this at coraid years ago.  it's a win — but only on the p4 and
netburst-based xeons with old-and-crappy hyperthreading enabled.  it
seems to otherwise be a small loss.

i don't see an actual performance problem on the 16-cpu machine.
i see an apparent performance problem.  the 4- and 16- processor
machines have a single-threaded speed ratio of ~ 1:1.7, so since
kprof does sampling on the clock interrupt, it seems reasonable
that processors could get in a timing-predictable loop and get
sampled at different places each time.  no way rebalance is using
40% of the cpu, right?  the anomoly in time(1) is not yet explained.
but it's clearly not much of a performance problem there was only
a 10% slowdown between 1 core busy and 16 cores busy.  that's
likely due to the fact that plan 9 knows nothing of the numa nature
of that board.

richard miller does point out a real problem.  idlehands just returns
if conf.nproc>1.  this is done so we don't have to wait for the next
clock tick should work become available.  this is a power management
problem, not a performance problem.  your interesting locking solution
posted previously doesn't help with this.  it's not even a locking problem.

a potential solution to this would be to have a new bit array, e.g.
active.schedwait which is set when a proc has no work.  the mach
could then call halt.  a mach could then check for an idle mach
to wake after reading a proc.  an apic ipi would be a suitable wakeup
mechanism with r.t. latencies < 500ns. 
(www.barrelfish.org/barrelfish_mmcs08.pdf)
one assumes that 500ns/2 + wakeup time ≈ wakeup time.

two unfinished thoughts:

1.  it sure wouldn't surprise me if this has been done in plan 9 before.
i'd be interested to know what ken's sequent kernel did.

2.  if today 16 machs are possible (and 128 on an intel xeon mp 7500—
8 sockets * 8 core * 2t = 128), what do we expect in 5 years?  128?

- erik

Reply via email to