In message from Bill Broadley <b...@cse.ucdavis.edu> (Thu, 13 Aug 2009
17:09:24 -0700):
Do I unerstand correctly that this results are for 4 cores& 4 openmp
threads ?
And what is DDR3 RAM: DDR3/1066 ?
Mikhail
I tried open64-4.2.2 with those flags and on a nehalem single socket:
$ opencc -O4 -fopenmp stream.c -o stream-open64 -static
$ opencc -O4 -fopenmp stream-malloc.c -o stream-open64-malloc -static
$ ./stream-open64
Total memory required = 457.8 MB.
Function Rate (MB/s) Avg time Min time Max time
Copy: 22061.4958 0.0145 0.0145 0.0146
Scale: 22228.4705 0.0144 0.0144 0.0145
Add: 20659.2638 0.0233 0.0232 0.0233
Triad: 20511.0888 0.0235 0.0234 0.0235
Dynamic:
$ ./stream-open64-malloc
Function Rate (MB/s) Avg time Min time Max time
Copy: 14436.5155 0.0222 0.0222 0.0222
Scale: 14667.4821 0.0218 0.0218 0.0219
Add: 15739.7070 0.0305 0.0305 0.0305
Triad: 15770.7775 0.0305 0.0304 0.0305
Intel C/C++ Compiler 10.1 on Harpertown CPUs:
Base OPT flags: -O2 -xT -ansi-alias -ip -i-static
Intel recently used
Intel C/C++ Compiler 11.0.081 on Nehalem CPUs:
-O2 -xSSE4.2 -ansi-alias -ip
and got good STREAM results in their HPCC submission on their
ENdeavor cluster.
$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream.c -o stream-icc
$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream-malloc.c -o
stream-icc-malloc
$ ./stream-icc | grep ":"
STREAM version $Revision: 5.9 $
Copy: 14767.0512 0.0022 0.0022 0.0022
Scale: 14304.3513 0.0022 0.0022 0.0023
Add: 15503.3568 0.0031 0.0031 0.0031
Triad: 15613.9749 0.0031 0.0031 0.0031
$ ./stream-icc-malloc | grep ":"
STREAM version $Revision: 5.9 $
Copy: 14604.7582 0.0022 0.0022 0.0022
Scale: 14480.2814 0.0022 0.0022 0.0022
Add: 15414.3321 0.0031 0.0031 0.0031
Triad: 15738.4765 0.0031 0.0030 0.0031
So ICC does manage zero penalty, alas no faster than open64 with the
penalty.
I'll attempt to track down the HPCC stream source code to see if
their dynamic
arrays are any friendlier than mine (I just use malloc).
In any case many thanks for the pointer.
Oh, my dynamic tweak:
$ diff stream.c stream-malloc.c
43a44
# include <stdlib.h>
97c98
< static double a[N+OFFSET],
---
/* static double a[N+OFFSET],
99c100,102
< c[N+OFFSET];
---
c[N+OFFSET]; */
double *a, *b, *c;
134a138,142
a=(double *)malloc(sizeof(double)*(N+OFFSET));
b=(double *)malloc(sizeof(double)*(N+OFFSET));
c=(double *)malloc(sizeof(double)*(N+OFFSET));
283c291,293
<
---
free(a);
free(b);
free(c);
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf