Re: [Beowulf] Re: Pretty High Performance Computing

Vincent Diepeveen Wed, 24 Sep 2008 18:52:27 -0700

2% comeon,

How do you plan to lose 'just 2%' if you make a lot of use from MPI?

let's be realistic; with respect to matrix calculations HPC can berelative efficient.As soon as we discuss algorithms that have the habit to besequential, then they arerather hard to parallellize at a HPC box. Even very good scientistsusually lose a factor 50 then or so

algorithmic.

It is questionable whether software that is embarassingly parallelshould be run atmegamillion dollar machines that are easily factor 5 less efficientin power,

provided it can work somehow well at normal PC/CUDA/Brooke type hardware

(meaning that some scientists love RAM just a tad too much; i'd arguethere is always algorithmspossible, though very complex sometimes, that can get a lot ofperformance with a tad less RAM,

after which you can move again to cheaper hardware).

I'd argue there is a very BIG market for a shared memory numaapproach, one that has however abetter solution for i/o and timing (so not using some sort of centralclock and central i/o processors

like SGI used to at the Origin boxes).

The few shared memory approaches that were historically faster than aPC, were that much more expensivethan a PC to just increase speed by a factor 2, that it isinteresting to see what will happen here.

the step from writing multithreaded/multiprocessing software thatworks at NUMA hardware to

a MPI type model is really big.

What happens as a result of that is that those MPI type approachesusually are not very well optimizedsoftware programs. The "one eyed software in the land of the blind",so to speak.

Sometimes that has very egoistic reasons. I've seen cases that doingmore calculations gives biggerround off errors which after a few months backtrack into the rootbigtime, causing the scientist to beable to draw sometimes the result he liked to draw, instead ofobjectively also being able to explainwhy the 'commercial' model that gets calculated quickly, whichsometimes exist which is why we know this,doesn't have those weird 'random' results, so no new theory can getconcluded.

I would be really amazed if more than 50% in this HPC list in theirtypical workloads gets an efficiency of over 2%.

We shouldn't praise ourselves to be better than we are simply. Havinglots of processors also makes most scientists very lazy.That isn't bad at all, the idea majority of scientists use HPC isthat you can take a look into the future what happens,

giving an advantage over a PC.

That said there is a few fields where the efficiency IS real real high.

But other than some guys who are busy with encryption i wouldn't beable to mention a single one to you.Yet you could also argue that those guys in fact waste most resourcesof everyone, as there is special co-processors(for embedded for example) and special dedicated processors (using aLOT of watts) made that are thousands oftimes faster than what you can do in a generic cpu, in which case the2% rule still is valid.

In HPC there is however 1 thing i really miss. I'm convinced itexists, a kind of GPU type cpu, with a lot of memory controllersattached, that's doing calculations in double precision. A smal teamof 5 persons can build it and clock is oh 300-350Mhz or so?

So the investment in itself isn't big. Getting to 1 Teraflop doubleprecision a cpu shouldn't be a big problem.


Where is that cpu?

Did no one care to design it as they can't make billions of dollarswith it?


Vincent

On Sep 25, 2008, at 12:20 AM, Mark Hahn wrote:

that, perhaps serendipitously, these service level delays due tonodes
not being completely optimized for cluster use don't result in a
significant reduction of computation speed until the size of the
cluster is about at the point where one would want a full-time admin
just to run the cluster.
no, not really. the issue is more like "how close to the edge areyou?"it's the edge-closeness (relative to cluster capabilities) thatmatters.
that is, if your program has very frequent global synchronization,
you're going to want low jitter. yes, exponentially more so as thesize of the job grows, but the importance of the issue also growsas your CPU increases in speed, as your interconnect improves, etc.
similarly, if you have an app which is finely cache-tuned,
it'll hurt, possibly a lot, when monitoring/etc takes a bite out.
don't worry about these service details too much, just do your work
knowing that you're maybe losing 2% speed (this number is a total
guesstimate).
2% might be reasonable if you're doing very non-edge stuff - forinstance, a lot embarassingly parallel or serial-farm workloadsthat don't use a lot of memory. it's not that those workloads areless worthy, just that they tolerate a lot more sloppiness.
again, it's the nature of the workload, not just size of the cluster.
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Re: Pretty High Performance Computing

Reply via email to