32-bit mode on a 8-core (twin Xeon) Linux box.  That core.cpuid bug
really, really sucks.

I see matrix inversion takes longer with 4 cores than with 1!

|> scons runall
/usr/bin/python /home/Checkouts/Mercurial/SCons/bootstrap/src/script/scons.py 
runall
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c -ofeuclidean.o 
euclidean.d
gcc -o euclidean -m32 euclidean.o -L/home/users/russel/lib.Linux.ix86 
-L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 
-lpthread -lm -lrt
dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c 
-ofmatrixInversion.o matrixInversion.d
gcc -o matrixInversion -m32 matrixInversion.o 
-L/home/users/russel/lib.Linux.ix86 
-L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 
-lpthread -lm -lrt
dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c -ofmillionSqrt.o 
millionSqrt.d
gcc -o millionSqrt -m32 millionSqrt.o -L/home/users/russel/lib.Linux.ix86 
-L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 
-lpthread -lm -lrt
dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c 
-ofparallelSort.o parallelSort.d
gcc -o parallelSort -m32 parallelSort.o -L/home/users/russel/lib.Linux.ix86 
-L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 
-lpthread -lm -lrt
dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c -ofpipelining.o 
pipelining.d
gcc -o pipelining -m32 pipelining.o -L/home/users/russel/lib.Linux.ix86 
-L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 
-lpthread -lm -lrt
runEverything(["runall"], ["euclidean", "matrixInversion", "millionSqrt", 
"parallelSort", "pipelining"])

========  euclidean  ============
Serial reduce:  1104 milliseconds.
Parallel reduce with 4 cores:  324 milliseconds.

========  matrixInversion  ============
Inverted a 256 x 256 matrix serially in 60 milliseconds.
Inverted a 256 x 256 matrix using 4 cores in 84 milliseconds.

========  millionSqrt  ============
Parallel benchmarks being done with 4 cores.
Did serial millionSqrt in 980 milliseconds.
Did parallel foreach millionSqrt in 355 milliseconds.
Did parallel map millionSqrt in 249 milliseconds.

========  parallelSort  ============
Serial quick sort:  5191 milliseconds.
Parallel quick sort:  1913 milliseconds.

========  pipelining  ============
Did serial string -> float, euclid in 2236 milliseconds.
Did parallel string -> float, euclid with 4 cores in 1528 milliseconds.

scons: done building targets.


-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:[email protected]
41 Buckmaster Road    m: +44 7770 465 077   xmpp: [email protected]
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to