Re: [Wien] Parallel execution on new Intel CPUs

pluto via Wien Tue, 14 Feb 2023 02:33:19 -0800

Dear Profs. Blaha, Marks,

Thank you for the information!

Could you give an estimate what could be a possible speed-up when I usempi parallelization?

My tests on 36-inequivalent-atom slab so far indicate that there isnearly no difference between different k-parallel and OMP settings. Sofar I tried


8x 1:localhost with OMP=2
16x 1:localhost with OMP=1
16x 1:localhost with OMP=2 (means slight overloading)

and the time per SCF cycle (runsp without so) is practically the same inall these. Later I will also try higher OMP with less 1:localhost, but Idoubt this can possibly be faster.

I have i7-13700K with 64 GB of RAM and NVMe SSD. During 36-atom-slabparallel calculation around 35 GB is used.


Best,
Lukasz

PS: Now omp_lapwso also works for me in .machines. I think it was a SOCissue with my test case (which was bulk Au). I am sorry for thisconfusion.





On 2023-02-14 10:23, Peter Blaha wrote:

I have no experience for such a CPU with fast and slow cores.

Simply test it out how you get the fastest turnaround for a fixed
number of k-points and different number of processes (should be
compatible with your k-points) and OMP=1-2 (4).

Previously, overloading (using more cores than the physical cores) was
NOT a good idea, but I don't know how this "fused" CPU behaves. Maybe
some "small" overloading is ok. This all depends on #-kpoints and
available cores.

PS:

I cannot verify your omp_lapwso:2 failure. My tests run fine and the
omp-setting is taken over properly.
I am now using a machine with i7-13700K. This CPU has 8 performancecores (P-cores) and 8 efficient cores (E-cores). In addition eachP-core has 2 threads, so there is 24 threads alltogether. It is hardto find some reasonable info online, but probably a P-core is approx.2x faster than an E-core:https://www.anandtech.com/show/17047/the-intel-12th-gen-core-i912900k-review-hybrid-performance-brings-hybrid-complexity/10This will of course depend on what is being calculated...
Do you have suggestions on how to optimize the .machines file for theparallel execution of an scf cycle?
On my machine using OMP_NUM_THREADS leads to oscillations of the CPUuse (for a large slab maybe 40% of time is spent on a single thread),suggesting that large OMP is not the optimal strategy.
Some examples of strategies:

One strategy would be to repeat the line
1:localhost
24 times, to have all the threads busy, and set OMP_NUM_THREADS=1.

Another would be set the line
1:localhost
8 times and set OMP_NUM_THREADS=2, this would mean using all 16physical cores.
Or perhaps one should better "overload" the CPU e.g. by doing1:localhost 16 times and OMP=2 ?
Over time I will try to benchmark some the different options, butperhaps there is some logic of how one should think about this.
In addition I have a comment on .machines file. It seems that for theFM+SOC (runsp -so) calculations the
omp_global

setting in .machines is ignored. The

omp_lapw1
omp_lapw2
settings seem to work fine. So, I tried to set OMP for lapwsoseparately, by including the line like:
omp_lapwso:2

but this gives an error when executing parallel scf.

Best,
Lukasz
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Parallel execution on new Intel CPUs

Reply via email to