Thanks for a thorough explanation! It all makes sense now.
I wish you luck with moving forward with this project.

Thanks,
Roman.

P.S. Keep us posted ;-)

On Wed, Jun 07, 2006 at 11:19:13PM -0600, Ronald G Minnich wrote:
> Roman Shaposhnik wrote:
> >One question that I still have, though, 
> >is what
> >makes you think that once you're done with porting gcc (big task) and 
> >porting HPC apps to
> >gcc/Plan9 (even bigger one!) they will *execute* faster than they do on 
> >Linux ?
> 
> Excellent question.
> 
> It's all about parallel performance; making sure your 1000 nodes run 
> 1000 times as fast as 1 node, or, if they don't, that it's Somebody 
> Else's Problem. The reason that the OS can impact parallel performance 
> boils down to the kinds of tasks that go on in OSes that can be run at 
> awkward times,and in turn interfere with parallel applications, and 
> result in degraded performance. (for another approach, see Cray's 
> synchronised scheduler work; make all nodes schedule the app at the same 
> time).
> 
> Imagine you have one of these lovely apps, on a 1000-node cluster with a 
> 5-microsecond latency network. Let us further imagine (this stuff 
> exists; see Quadrics) that you can do a broadcast/global sum op in 5 
> microseconds. After 1 millisecond, they all need to talk to each other, 
> and can not proceed until they're all agreed on (say) the value of a 
> computed number -- e.g. some sort of global sum of a variable held by 
> each of 1000 procs. The generic term for this type of thing is 'global 
> reduction' -- you reduce a vector to a scalar of some sort.
> 
> The math is pretty easy to do, but it boils down to this: OS activities 
> can interfere with, say, just one task, and kill the parallel 
> performance of the app, making your 1000-node app run like a 750 node 
> app -- or worse.
> 
> Pretend you're delayed one microsecond; do the math; it's depressing. 
> One millisecond compute interval is a really extreme case, chosen for 
> ease of illustration, but ...
> 
> In the clustering world, what a lot of people do is run real heavy nodes 
> in clusters -- they have stuff like cron running, if you can believe it! 
> They pretty much do a full desktop install, then turn off a few daemons, 
> and away they go. Some really famous companies actually run clusters 
> this way -- you'd be surprised at who. So do some famous gov't labs.
> 
> If they're lucky, interference never hits them. If they're not, they get 
> less-than-ideal app performance. Then, they draw a conjecture from the 
> OS interference that comes with such bad configuration: you can't run a 
> cluster node with anything but a custom OS which has no clock 
> interrupts, and, for that matter, no ability to run more than one 
> process at a time. See the computational node kernel on the BG/L for one 
> example, or the catamount kernel on Red Storm. Those kernels are really 
> constrained; just running one proc at a time is only part of the story.
> 
> Here at LANL, we run pretty light cluster nodes.
> 
> Here is a cluster node running xcpu (under busybox, as you can see):
>     1 ?        S      0:00 /bin/ash /linuxrc
>     2 ?        S      0:00 [migration/0]
>     3 ?        SN     0:00 [ksoftirqd/0]
>     4 ?        S      0:00 [watchdog/0]
>     5 ?        S      0:00 [migration/1]
>     6 ?        SN     0:00 [ksoftirqd/1]
>     7 ?        S      0:00 [watchdog/1]
>     8 ?        S      0:00 [migration/2]
>     9 ?        SN     0:00 [ksoftirqd/2]
>    10 ?        S      0:00 [watchdog/2]
>    11 ?        S      0:00 [migration/3]
>    12 ?        SN     0:00 [ksoftirqd/3]
>    13 ?        S      0:00 [watchdog/3]
>    14 ?        S<     0:00 [events/0]
>    15 ?        S<     0:00 [events/1]
>    16 ?        S<     0:00 [events/2]
>    17 ?        S<     0:00 [events/3]
>    18 ?        S<     0:00 [khelper]
>    19 ?        S<     0:00 [kthread]
>    26 ?        S<     0:00 [kblockd/0]
>    27 ?        S<     0:00 [kblockd/1]
>    28 ?        S<     0:00 [kblockd/2]
>    29 ?        S<     0:00 [kblockd/3]
>   105 ?        S      0:00 [pdflush]
>   106 ?        S      0:00 [pdflush]
>   107 ?        S      0:00 [kswapd1]
>   109 ?        S<     0:00 [aio/0]
>   108 ?        S      0:00 [kswapd0]
>   110 ?        S<     0:00 [aio/1]
>   111 ?        S<     0:00 [aio/2]
>   112 ?        S<     0:00 [aio/3]
>   697 ?        S<     0:00 [kseriod]
>   855 ?        S      0:00 xsrv -D 0 tcp!*!20001
>   857 ?        S      0:00 9pserve -u tcp!*!20001
>   864 ?        S      0:00 u9fs -a none -u root -m 65560 -p 564
>   865 ?        S      0:00 /bin/ash
> 
> see how little we have running? Oh, but wait, what's all that stuff in 
> []? It's the stuff we can't turn off. Note there is per-cpu stuff, and 
> other junk. Note that this node has been up for five hours, and this 
> stuff is pretty quiet(0 run time); our nodes are the quietest (in the OS 
> interference sense) Linux nodes I have yet seen. But, that said, all 
> this can hit you.
> 
> And, in Linux, there's a lot of stuff people are finding you can't turn 
> off. Lots of timers down there, lots of magic that goes on, and you just 
> can't turn it off, or adjust it, try as you might.
> 
> Plan 9, our conjecture goes, is a small, tight, kernel, with lots of 
> stuff moved to user mode (file systems); and, we believe that the Plan 9 
> architecture is a good match to future HPC (High Performance Computing) 
> systems, as typified by Red Storm and BG/L: small, fixed-configuration 
> nodes with memory, network, CPU, and nothing else. The ability to not 
> even have a file system on the node is a big plus. The ability to 
> transparently have the file system remote/local puts the application 
> into the driver's seat as to how the node is configured, and what 
> tradeoffs are made; the system as a whole is incredibly flexible.
> 
> Our measurements, so far, do show that Plan 9 is "quieter" than Linux. A 
> full Plan 9 desktop has less OS noise than a Linux box at the login 
> prompt. This matters.
> 
> But it only matters if people can run their apps. Hence our concern 
> about getting gcc-based cra-- er, applications code, running.
> 
> I'm not really trying to make Plan 9 look like Linux. I just want to run 
> MPQC for a friend of mine :-)
> 
> thanks
> 
> ron

Reply via email to