These numbers make "sense". k-point parallelization:
In "real" cases, one has to solve an eigenvalue problem for "many" k-points ("Many" means typically 10-1000). In these cases, k-point parallelism is very efficient. The benchmark case has only ONE k-point in its *.klist file, thus there's no k-point parallelism. When you edit the *.klist file (and eg. repeat the 1st line 8 times), you will see, that the sequential run will take almost exactly 8 times as long. However, whith k-point parallelism you will probably get a speedup of 4-6 on your machine. Still one can find out, what is more efficient on your specific machine: use (for 8 k-points) 8 lines in .machines and OMP_NUM_THREAD=1 ; or only 4 lines and OMP..=2 I'm sure the colleagues from physics/chemistry can explain the "k-points" to you. Regards Todd Pfaff schrieb: > I get much better timings for the serial benchmark using an ifort+mkl > version of wien2k on the same machine. I'm not seeing any speedup > with k-point parallelization yet though. > > - machine: dual Xeon quad-core E5430 @ 2.66GHz with 8GB 667MHz RAM > > 1) timings for wien2k-08.2-20080407 built with > - ifort 10.1.017 > - mkl 10.0.3.020 > > 1.1) wien2k serial benchmark > - x lapw1 -c > - varying OMP_NUM_THREADS from 1 to 8 > > OMP_NUM_THREADS=1: 116.292u 0.386s 1:56.69 99.9% 0+0k 0+33256io 0pf+0w > OMP_NUM_THREADS=2: 148.964u 0.963s 1:17.11 194.4% 0+0k 0+33240io 0pf+0w > OMP_NUM_THREADS=3: 182.932u 1.495s 1:11.11 259.3% 0+0k 0+33240io 0pf+0w > OMP_NUM_THREADS=4: 213.973u 1.356s 1:03.52 338.9% 0+0k 0+33240io 0pf+0w > OMP_NUM_THREADS=5: 251.813u 2.195s 1:03.51 399.9% 0+0k 0+33240io 0pf+0w > OMP_NUM_THREADS=6: 294.103u 2.429s 1:02.11 477.4% 0+0k 0+33240io 0pf+0w > OMP_NUM_THREADS=7: 329.413u 2.686s 1:01.91 536.4% 0+0k 0+33240io 0pf+0w > OMP_NUM_THREADS=8: 374.467u 2.488s 1:01.12 616.7% 0+0k 0+33240io 0pf+0w > > 1.2) wien2k serial benchmark run with k-point parallelism > - process started with command 'x lapw1 -p' > - OMP_NUM_THREADS=1, GOTO_NUM_THREADS=1 > - varying .machines file with N lines, N from 1 to 8, where each line is: > > 1:localhost > > k-point parallel N=1: localhost k=1 user=116.173 > wallclock=116.59 > k-point parallel N=2: localhost k=1 user=116.312 > wallclock=116.79 > k-point parallel N=3: localhost k=1 user=116.254 > wallclock=116.66 > k-point parallel N=4: localhost k=1 user=116.306 > wallclock=116.76 > k-point parallel N=5: localhost k=1 user=116.09 > wallclock=116.52 > k-point parallel N=6: localhost k=1 user=116.218 > wallclock=116.66 > k-point parallel N=7: localhost k=1 user=116.251 > wallclock=116.68 > k-point parallel N=8: localhost k=1 user=116.372 > wallclock=116.79 > > > 2) timings for wien2k-08.2-20080407 built with > - GNU Fortran (GCC) 4.2.3 (4.2.3-6mnb1) > - GotoBLAS-1.26 > > 2.1) wien2k serial benchmark > - x lapw1 -c > - varying OMP_NUM_THREADS from 1 to 8 > > OMP_NUM_THREADS=1: 195.463u 0.307s 3:15.80 99.9% 0+0k 0+33264io 0pf+0w > OMP_NUM_THREADS=2: 199.565u 0.569s 2:57.40 112.8% 0+0k 0+33264io 0pf+0w > OMP_NUM_THREADS=3: 204.145u 0.635s 2:51.02 119.7% 0+0k 0+33264io 0pf+0w > OMP_NUM_THREADS=4: 211.666u 0.736s 2:49.02 125.6% 0+0k 0+33264io 0pf+0w > OMP_NUM_THREADS=5: 222.604u 1.032s 2:48.41 132.7% 0+0k 0+33264io 0pf+0w > OMP_NUM_THREADS=6: 231.258u 0.927s 2:47.54 138.5% 0+0k 0+33264io 0pf+0w > OMP_NUM_THREADS=7: 243.170u 0.996s 2:46.55 146.5% 0+0k 0+33264io 0pf+0w > OMP_NUM_THREADS=8: 252.584u 0.916s 2:46.57 152.1% 0+0k 0+33264io 0pf+0w > > > -- > Todd Pfaff <pfaff at mcmaster.ca> > Research & High-Performance Computing Support > McMaster University, Hamilton, Ontario, Canada > http://www.rhpcs.mcmaster.ca/~pfaff > > > On Tue, 12 Aug 2008, Peter Blaha wrote: > >> Looking on these numbers tells me, that you probably should invest into >> ifort + mkl. It does not make sense to buy expensive new hardware, but >> with bad software it runs slower than on a 6 year old PC. >> Compare your timing with the benchmark page to see what is possible. >> >> k-point parallelization: Please read the UG !!! This is fairly simple. >> >> 1:localhost:4 utilizes the mpi-parallel version; >> >> you need to put N-lines >> >> 1:localhost >> 1:localhost >> ... >> >> to specify running N lapw1 processes in parallel. >> >> Todd Pfaff schrieb: >>> Peter, thanks for the response. >>> >>> I'm getting small speedup from multithreading in libgoto. Here are >>> timings from the wien2k serial benchmark: >>> >>> OMP_NUM_THREADS=1: 195.463u 0.307s 3:15.80 99.9% 0+0k 0+33264io >>> 0pf+0w >>> OMP_NUM_THREADS=2: 199.565u 0.569s 2:57.40 112.8% 0+0k 0+33264io >>> 0pf+0w >>> OMP_NUM_THREADS=3: 204.145u 0.635s 2:51.02 119.7% 0+0k 0+33264io >>> 0pf+0w >>> OMP_NUM_THREADS=4: 211.666u 0.736s 2:49.02 125.6% 0+0k 0+33264io >>> 0pf+0w >>> OMP_NUM_THREADS=5: 222.604u 1.032s 2:48.41 132.7% 0+0k 0+33264io >>> 0pf+0w >>> OMP_NUM_THREADS=6: 231.258u 0.927s 2:47.54 138.5% 0+0k 0+33264io >>> 0pf+0w >>> OMP_NUM_THREADS=7: 243.170u 0.996s 2:46.55 146.5% 0+0k 0+33264io >>> 0pf+0w >>> OMP_NUM_THREADS=8: 252.584u 0.916s 2:46.57 152.1% 0+0k 0+33264io >>> 0pf+0w >>> >>> >>> I would like explore the k-point parallelization. But when I run >>> 'x lapw1 -p' it aborts with an error message about being unable to run >>> lapw1c_mpi. This appears to me like it's trying to run the fine grained >>> MPI parallel version. I'm not building wien2k with mpi so I don't have a >>> lapw1c_mpi. I must be misunderstanding something. What am I doing wrong >>> that's causing it to try to run this lapw1c_mpi executable? >>> >>> Which of these are appropriate .machines files to do k-point >>> parallelization across N cpu cores on a single machine? >>> >>> This? >>> >>> 1:localhost:N >>> >>> Or this? >>> >>> N:localhost >>> >>> And do I need any of these lines? >>> >>> extrafine >>> granularity:1 >>> residue:localhost >>> >>> Or do I need something else either in .machines or in some other >>> file or on the command line? >>> >>> -- >>> Todd Pfaff <pfaff at mcmaster.ca> >>> Research & High-Performance Computing Support >>> McMaster University, Hamilton, Ontario, Canada >>> http://www.rhpcs.mcmaster.ca/~pfaff >>> >>> On Mon, 11 Aug 2008, Peter Blaha wrote: >>> >>>> The program lapw1 spends a large fraction in BLAS-routines, thus it can >>>> benefit from multithreading of GOTOLIBS (or MKL). >>>> Setting the variables you mentioned to 2 (or 4) you should see a >>>> speedup. The improvement may depend on many factors but it will be at >>>> most about 50%. >>>> >>>> Another possibility to utilize the multiple cores is to do k-point >>>> parallelism. >>>> Generate a .machines file with 2,4 or 8 times your machine name >>>> and test the performance with x lapw1 -p. >>>> On some architectures (with slow memory bus) it can be that only 4 >>>> parallel jobs give best performance (because the slow memory bus cannot >>>> feed all 8 cpus properly), on others you can use 8 parallel jobs. >>>> Sometimes a mixture (4 k-point parallel + OMP_NUM_THREADS=2) is best. >>>> >>>> Todd Pfaff schrieb: >>>>> We're using: >>>>> >>>>> wien2k-08.2-20080407 >>>>> >>>>> built with: >>>>> >>>>> GNU Fortran (GCC) 4.2.3 (4.2.3-6mnb1) >>>>> GotoBLAS-1.26 >>>>> >>>>> and running on an 8 core (2 x quad core) Xeon machine. >>>>> >>>>> Can wien2k take advantage of multithreading inherent to GotoBLAS >>>>> when either GOTO_NUM_THREADS or OMP_NUM_THREADS is set? >>>>> >>>>> If so, can someone provide, or direct me to a document about details of >>>>> the best way to build and run wien2k for such an environment? >>>>> >>>>> Thank you, >>>>> -- >>>>> Todd Pfaff <pfaff at mcmaster.ca> >>>>> Research & High-Performance Computing Support >>>>> McMaster University, Hamilton, Ontario, Canada >>>>> http://www.rhpcs.mcmaster.ca/~pfaff >>>>> _______________________________________________ >>>>> Wien mailing list >>>>> Wien at zeus.theochem.tuwien.ac.at >>>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien >>> _______________________________________________ >>> Wien mailing list >>> Wien at zeus.theochem.tuwien.ac.at >>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien >> > _______________________________________________ > Wien mailing list > Wien at zeus.theochem.tuwien.ac.at > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- ----------------------------------------- Peter Blaha Inst. Materials Chemistry, TU Vienna Getreidemarkt 9, A-1060 Vienna, Austria Tel: +43-1-5880115671 Fax: +43-1-5880115698 email: pblaha at theochem.tuwien.ac.at -----------------------------------------