On 15-Jul-08, at 1:01 AM, Bakul Shah wrote:
I suspect a lot of this complexity will end up being dropped when you don't have to worry about efficiently using the last N% of cpu cycles.
Would that I weren't working on a multi-core graphics part... That N% is what the game is all about.
When your bottleneck is memory bandwidth using core 100% is not going to happen in general.
But in most cases, that memory movement has to share the bus with increasingly remote cache accesses, which in turn take bandwidth. Affinity is a serious win for reducing on-chip bandwidth usage in cache-coherent many-core systems.
And I am not sure thread placement belongs in the kernel. Why not let an application manage its allocation of h/w thread x cycle resources? I am not even sure a full kernel belongs on every core.
I'm still looking for the right scheduler, in kernel or user space, that lets me deal with affinitizing 3 resources that run at different granularities: per-core cache, hardware-thread-to-core, and cross-chip caches. There's a rough hierarchy implied by these three resources, and perfect scheduling might be possible in a purely cooperative world, but reality imposes pre-emption and resource virtualization.
Unlike you I think the kernel should do even less as more and more cores are added. It should basically stay out of the way. Less government, more privatization :-) So may be the plan9 kernel would a better starting point than a Unix kernel.
Agreed, less and less in the kernel, but *enough*. I like resource virtualization, and as long as it gets affinity right, I win.
Paul
