Happy New Year, all. terry mcintyre: <[EMAIL PROTECTED]>: >I have been tinkering with OpenMP and my new HP Quad >Intel 6600. Wrote a small program to compute the >Taylor series of e and pi, just for exploration, and >I've found some interesting data points. > >I am using gcc 4.2 and 4.3.1 - the latter being the >head of the SVN repository. Kubuntu 7.10, both 32 and >64 bit versions. One of my test programs is attached. > >Oddly, the OpenMP version is no faster than the >single-threaded version - but it does keep the cores >busier. It is possible that I am doing something >wrong, as I am new to OpenMP. > >I was so puzzled by the results that I tried the same >program on my AMD Athlon X2. The older AMD Athlon duo, >with a 1 GHz clock, 64-bit Fedora Core 7, is 20% >faster than the 1.6GHz quad 6600. I've also run the >--monte-carlo version of GnuGo 4.7.11 on both >machines, with similar results. > >The compilation line is: >gcc -Wall -fopenmp -O3 -march=native -lgomp taylor3.c >-o taylor3 > >( the code is an adaptation of code from the OpenMP >tutorial at http://kallipolis.com/openmp/ - which >leads to another interesting discovery. The original >code yields incorrect results for pi; the two parallel >branches use the same index variable i, >and one stomps on the other. Is this a feature of the >gcc version of OpenMP? I'll be testing Intel's icc >soon. ) > >I'll be doing more testing this weekend, but I'd like >to know if anyone has compared the Intel 6600 to other >processors. So far, it sure looks like a tired old nag >on her last ride to the glue factory; I'm wishing that >I had waited for the Penryn version. > >One more puzzle: this processor is rated at 2.4GHz, >but cpuinfo tells a different story:
It's because SpeedStep is working. You can stop it in BIOS setting. http://en.wikipedia.org/wiki/SpeedStep -Hideki >[EMAIL PROTECTED]:/proc$ cat cpuinfo >processor : 0 >vendor_id : GenuineIntel >cpu family : 6 >model : 15 >model name : Intel(R) Core(TM)2 Quad CPU Q6600 > @ 2.40GHz >stepping : 11 >cpu MHz : 1596.000 >cache size : 4096 KB >physical id : 0 >siblings : 4 >core id : 0 >cpu cores : 4 >fpu : yes >fpu_exception : yes >cpuid level : 10 >wp : yes >flags : fpu vme de pse tsc msr pae mce cx8 >apic sep mtrr pge mca cmov pat pse36 clflush dts acpi >mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc >pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm >bogomips : 4804.08 >clflush size : 64 >cache_alignment : 64 >address sizes : 36 bits physical, 48 bits virtual >power management: > > >Terry McIntyre <[EMAIL PROTECTED]> > >Wherever is found what is called a paternal government, there is found state >education. It >has been discovered that the best way to insure implicit obedience is to >commence tyranny in >the nursery. > >Benjamin Disraeli, Speech in the House of Commons [June 15, 1874] > > > > ____________________________________________________________________________________ >Never miss a thing. Make Yahoo your home page. >http://www.yahoo.com/r/hs >/* > * taylor.c > * > * calculate e and pi by their taylor expansions and multiply them > * together. > * > * moved local variables inside parallel blocks ( performance tweak? ) > */ > >#include <omp.h> >#include <stdio.h> >#include <time.h> > >#define num_steps 20000000 > >int main(int argc, char *argv[]) >{ > double start, stop; /* times of beginning and end of procedure */ > double efinal, pifinal, product; > > /* start the timer */ > start = clock(); > > /* calculate e and pi in parallel */ >#pragma omp parallel sections shared(efinal,pifinal) > { >#pragma omp section > { /* calculate e using Taylor approximation */ > register double e, factorial; > register int j; > > e = 1; > factorial = 1; > for (j = 1; j<num_steps; j++) { > factorial *= j; > e += 1.0/factorial; > } > efinal=e; > } /* e section */ > >#pragma omp section > { /* calculate pi expansion */ > register int i; > register double pi; > > pi = 0; > for (i = 0; i < num_steps*10; i++) { > /* we want 1/1 - 1/3 + 1/5 - 1/7 etc. > therefore we count by fours (0, 4, 8, 12...) and take > 1/(0+1) = 1/1 > - 1/(0+3) = -1/3 > 1/(4+1) = 1/5 > - 1/(4+3) = -1/7 and so on */ > pi += 1.0/(i*4.0 + 1.0); > pi -= 1.0/(i*4.0 + 3.0); > } > pi = pi * 4.0; > pifinal=pi; > } /* pi section */ > > } /* omp sections */ > /* threads rejoin here */ > > product = efinal * pifinal; > > stop = clock(); > > printf("e %f pi %f products = %f reached in %.3f seconds\n", efinal, > pifinal, product, >(double)(stop-start)/CLOCKS_PER_SEC); > > return 0; >} >---- inline file >_______________________________________________ >computer-go mailing list >[email protected] >http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
