They have a paper that explains it well and has some
interesting benchmarks.
http://sc06.supercomputing.org/schedule/pdf/pap225.pdf
this is quite interesting. I wish they had done benchmarks with doubles,
especially since they alluded to, for instance, the n-body calculation
really needing at least careful consideration of precision/resolution.
(now that I think of it, using 23 bits of mantisas on a 256^3 FFT
sounds numerically dubious too.)
interesting that for a 2.4GHz Cell, they get at most 10 FP Gflops per SPE.
does anyone have SGEMM numbers for a 3GHz Intel Core2? I'll guess that
efficiency of libgoto with 2 threads would be >= 80%, so flops would be
.8*2*8*3 =~ 40 Gflops, or half a Cell chip.
makes it hard to argue for wide use of Cell, I think...
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf