This makes sense.
Please let me know if it shows

EXECUTING: /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current -green -scratch /scratch/WIEN2k/ -noco

or only    nmr -case ...

In any case, it is running correctly.

PS: I know that also the current step needs a lot of memory, after all it needs to read the eigenvectors of all eigenvalues, ...

PPS: -quota 8 (or 24) might help and still utilizing all cores, but I'm not sure if it would save enough memory in the current steps.



Am 12.05.2024 um 10:09 schrieb Michael Fechtelkord via Wien:
Hello all, hello Peter,


That is what is really running in the background (from htop: this is a new job with 4 nodes but it was the same with 8 nodes -p 1 - 8), so no nmr_mpi.


TIME+ Command

96.0 14.9 19h06:05 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode current -green -scratch /scratch/WIEN2k/ -noco -p 3

95.8 14.9 19h05:10 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode current -green -scratch /scratch/WIEN2k/ -noco -p 1

95.1 14.9 19h06:00 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode current -green -scratch /scratch/WIEN2K/ -noco -p 2

95.5 15.4 19h08:10 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode current -green -scratch /scratch/WIEN2k/ -noco -p 4

94.6 14.9 18h35:33 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode current -green -scratch /scratch/WIEN2k/ -noco -p 3

93.3 15.4 18h36:24 /usr/local/WIEN2k/nmr-case MS_2M1_Al2 -mode current -green -scratch /scratch/WIEN2k/ -noco -p 4

93.3 14.9 18h33:02 /usr/local/WIEN2k/nmr-case MS_2M1_A12 -mode current -green -scratch/scratch/WIEN2k/ -noco -p2

94.0 14.9 18h38:44 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode current -green -scratch /scratch/WIEN2k/ -noco -p 1


Regards,

Michael


Am 11.05.2024 um 20:10 schrieb Michael Fechtelkord via Wien:
Hello Peter,


I just use "x_nmr_lapw -p" and the rest is initiated by the nmr script. The Line "/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current -green         -scratch /scratch/WIEN2k/ -noco " is just part of the whole procedure and not initiated by me manually.. (I only copied the last lines of the calculation).


Best regards,

Michael


Am 11.05.2024 um 18:08 schrieb Peter Blaha:
Hallo Michael,

I don't understand the line:

/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current -green         -scratch /scratch/WIEN2k/ -noco

The mode current should run only k-parallel, not in mpi ??

PS: The repetition of

nmr_integ:localhost    is useless.

nmr mode integ runs only once (not k-parallel, sumpara has already summed up the currents)

But one can use       nmr_integ:localhost:8


Best regards

Am 11.05.2024 um 16:19 schrieb Michael Fechtelkord via Wien:
Hello Peter,

this is the .machines file content:

granulartity:1
omp_lapw0:8
omp_global:2
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost


Best regards,

Michael


Am 11.05.2024 um 14:58 schrieb Peter Blaha:
Hmm. ?

Are you using   k-parallel  AND  mpi-parallel ??  This could overload the machine.

How does the .machines file look like ?


Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
Dear all,


the following problem occurs to me using the NMR part of WIEN2k (23.2) on a opensuse LEAP 15.5 Intel platform. WIEN2k was compiled using one-api 2024.1 ifort and gcc 13.2.1. I am using ELPA 2024.03.01, Libxc 6.22, fftw 3.3.10 and MPICH 4.2.1 and the one-api 2024.1 MKL libraries. The CPU is a I9 14900k with 24 cores where I use eight for the calculations. The RAM is 130 Gb and a swap file of 16 GB on a Samsung PCIE 4.0 NVME SSD. The BUS width is 5600 MT / s.

The structure is a layersilicate and to simulate the ratio of Si:Al = 3:1 I use a 1:1:2 supercell currently. The monoclinic symmetry of the new structure (original is C 2/c) is P 2/c and contains 40 atoms (K, Al, Si, O, and F).

I use 3 NMR LOs for K and O and 10 for Si, Al, and F (where I need the chemical shifts). The k mesh is 40k points.

The interesting thing is that the RAM is sufficient during NMR vector calculations (always under 100 Gb RAM occupied) and at the beginning of the electron current calculation. However, the RAM use increases to a critical point in the calculation and more and more data is outsourced into the SWAP File which is sometimes 80% occupied.

As you see this time only one core failed because of memory overflow. But using 48k points 3 cores crashed and so the whole current calculation. The reason is of the crash clear to me. But I do not understand, why the current calculation reacts so sensitive with so few atoms and a small k mesh. I made calculations with more atoms and a 1000K point mesh on 4 cores .. they worked fine. So can it be that the Intel MKL library is the source of failure? So I better get back to 4 cores, even with longer calculation times?

Have all a nice weekend!


Best wishes from

Michael Fechtelkord

-----------------------------------------------

cd ./  ...  x lcore  -f MS_2M1_Al2
 CORE  END
0.685u 0.028s 0:00.71 98.5%     0+0k 2336+16168io 5pf+0w

lcore        ....  ready


 EXECUTING:     /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current    -green         -scratch /scratch/WIEN2k/ -noco

[1] 20253
[2] 20257
[3] 20261
[4] 20265
[5] 20269
[6] 20273
[7] 20277
[8] 20281
[8]  + Abgebrochen                   ( cd $dir; $exec2 >> nmr.out.${loop} ) >& nmr.err.$loop [7]  + Fertig                        ( cd $dir; $exec2 >> nmr.out.${loop} ) >& nmr.err.$loop [6]  + Fertig                        ( cd $dir; $exec2 >> nmr.out.${loop} ) >& nmr.err.$loop [5]  + Fertig                        ( cd $dir; $exec2 >> nmr.out.${loop} ) >& nmr.err.$loop [4]  + Fertig                        ( cd $dir; $exec2 >> nmr.out.${loop} ) >& nmr.err.$loop [3]  + Fertig                        ( cd $dir; $exec2 >> nmr.out.${loop} ) >& nmr.err.$loop [2]  + Fertig                        ( cd $dir; $exec2 >> nmr.out.${loop} ) >& nmr.err.$loop [1]  + Fertig                        ( cd $dir; $exec2 >> nmr.out.${loop} ) >& nmr.err.$loop

 EXECUTING:     /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode sumpara  -p 8    -green -scratch /scratch/WIEN2k/


current        ....  ready


 EXECUTING:     mpirun -np 1 -machinefile .machine_nmrinteg /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green


nmr:  integration  ... done in   4032.3s


stop


--
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300
Email: peter.bl...@tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at
-------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to