32-bit mode on a 8-core (twin Xeon) Linux box. That core.cpuid bug really, really sucks.
I see matrix inversion takes longer with 4 cores than with 1! |> scons runall /usr/bin/python /home/Checkouts/Mercurial/SCons/bootstrap/src/script/scons.py runall scons: Reading SConscript files ... scons: done reading SConscript files. scons: Building targets ... dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c -ofeuclidean.o euclidean.d gcc -o euclidean -m32 euclidean.o -L/home/users/russel/lib.Linux.ix86 -L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 -lpthread -lm -lrt dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c -ofmatrixInversion.o matrixInversion.d gcc -o matrixInversion -m32 matrixInversion.o -L/home/users/russel/lib.Linux.ix86 -L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 -lpthread -lm -lrt dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c -ofmillionSqrt.o millionSqrt.d gcc -o millionSqrt -m32 millionSqrt.o -L/home/users/russel/lib.Linux.ix86 -L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 -lpthread -lm -lrt dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c -ofparallelSort.o parallelSort.d gcc -o parallelSort -m32 parallelSort.o -L/home/users/russel/lib.Linux.ix86 -L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 -lpthread -lm -lrt dmd -I. -I/home/users/russel/lib/D -m32 -release -O -inline -c -ofpipelining.o pipelining.d gcc -o pipelining -m32 pipelining.o -L/home/users/russel/lib.Linux.ix86 -L/home/users/russel/lib.Linux.x86_64/DMD2/lib32 -lparallelism -lphobos2 -lpthread -lm -lrt runEverything(["runall"], ["euclidean", "matrixInversion", "millionSqrt", "parallelSort", "pipelining"]) ======== euclidean ============ Serial reduce: 1104 milliseconds. Parallel reduce with 4 cores: 324 milliseconds. ======== matrixInversion ============ Inverted a 256 x 256 matrix serially in 60 milliseconds. Inverted a 256 x 256 matrix using 4 cores in 84 milliseconds. ======== millionSqrt ============ Parallel benchmarks being done with 4 cores. Did serial millionSqrt in 980 milliseconds. Did parallel foreach millionSqrt in 355 milliseconds. Did parallel map millionSqrt in 249 milliseconds. ======== parallelSort ============ Serial quick sort: 5191 milliseconds. Parallel quick sort: 1913 milliseconds. ======== pipelining ============ Did serial string -> float, euclid in 2236 milliseconds. Did parallel string -> float, euclid with 4 cores in 1528 milliseconds. scons: done building targets. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:[email protected] 41 Buckmaster Road m: +44 7770 465 077 xmpp: [email protected] London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
signature.asc
Description: This is a digitally signed message part
