> In a crude test on a 1.5GHz p4 willamette with a local fossil/venti and > 256mb of ram, 'time mk 'CONF=pcf' > /dev/null' in /sys/src/9/pc, on a > fully-built source tree, adding the PAUSE reduced times from an average of > 18.97s to 18.84s (across ten runs).
we tried this at coraid years ago. it's a win — but only on the p4 and netburst-based xeons with old-and-crappy hyperthreading enabled. it seems to otherwise be a small loss. i don't see an actual performance problem on the 16-cpu machine. i see an apparent performance problem. the 4- and 16- processor machines have a single-threaded speed ratio of ~ 1:1.7, so since kprof does sampling on the clock interrupt, it seems reasonable that processors could get in a timing-predictable loop and get sampled at different places each time. no way rebalance is using 40% of the cpu, right? the anomoly in time(1) is not yet explained. but it's clearly not much of a performance problem there was only a 10% slowdown between 1 core busy and 16 cores busy. that's likely due to the fact that plan 9 knows nothing of the numa nature of that board. richard miller does point out a real problem. idlehands just returns if conf.nproc>1. this is done so we don't have to wait for the next clock tick should work become available. this is a power management problem, not a performance problem. your interesting locking solution posted previously doesn't help with this. it's not even a locking problem. a potential solution to this would be to have a new bit array, e.g. active.schedwait which is set when a proc has no work. the mach could then call halt. a mach could then check for an idle mach to wake after reading a proc. an apic ipi would be a suitable wakeup mechanism with r.t. latencies < 500ns. (www.barrelfish.org/barrelfish_mmcs08.pdf) one assumes that 500ns/2 + wakeup time ≈ wakeup time. two unfinished thoughts: 1. it sure wouldn't surprise me if this has been done in plan 9 before. i'd be interested to know what ken's sequent kernel did. 2. if today 16 machs are possible (and 128 on an intel xeon mp 7500— 8 sockets * 8 core * 2t = 128), what do we expect in 5 years? 128? - erik
