Re: [Beowulf] bizarre scaling behavior on a Nehalem

Mikhail Kuzminsky Wed, 12 Aug 2009 11:55:35 -0700

In message from Gus Correa <g...@ldeo.columbia.edu> (Wed, 12 Aug 200914:09:04 -0400):

Hi Bill, list

Bill:  This is very interesting indeed.  Thanks for sharing!


Bill's graph seem to show that Shanghai and Barcelona scale
(almost) linearly with the number of cores, whereas Nehalem stops
scaling and flattens out at 4 cores.

The Nehalem 8 cores and 4 cores curves are virtuallyindistinguishable,

and for very large arrays 4 cores is ahead.
Only for huge arrays (>16M) Nehalem gets ahead
of Shanghai and Barcelona.


IMHO, if arrays are not "huge", they will fit in cache L3 (8MB !).
Or on X axe are presented Mwords ?

Mikhail

Did I interpret the graph right?
Wasn't this type of scaling problem that plagued
the Clovertown and Harpertown?
Any possibility that kernels, BIOS, etc, are not yet ready forNehalem?
Thanks,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Bill Broadley wrote:
I've been working on a pthread memory benchmark that is looselymodeled onMcCalpin's stream. It's been quite a challenge to remove all thenoise/lostperformance from the benchmark to get close to performance Iexpected. Some
of the obstacles:
* For the compilers that tend to be better at stream (open64 andpathscale),you lose the performance if you just replace double a[],b[],c[]withdouble *a,*b,*c. Patch[1] available. I don't have a work aroundforthis, suggestions welcome. Is it really necessary for dynamicarrays
  to be substantially slower than static?
* You have to be very careful with pointer alignment both with cachelines,
  and each other
* cpu_affinity (by CPU id)
* numa (by socket id)
The results are relatively smooth graphs, here's an example, it'suselessly
busy until you toggle off a few graphs (by clicking on the key):

http://cse.ucdavis.edu/bill/pstream.svg
The biggest puzzle I have now is what the previous generation intelquads, thecurrent generation AMD quads, and numerous other CPUs show a bigbenefit in
L1, while the nehalem shows no benefit.

[1] http://cse.ucdavis.edu/bill/stream-malloc.patch


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by PenguinComputingTo change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by PenguinComputingTo change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
--
üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] bizarre scaling behavior on a Nehalem

Reply via email to