Dear Wien2k users,

I’ve recently encountered a strange situation in parallel execution of Wien2k 
(version 19). Normally I run wien2k jobs using OpenMP and they works without 
any trouble. But recently there has been a project that I need to run wien2k 
using k-point parallelization, and I am having a trouble that I couldn’t solve.

Issue:

• When running wien2k using k-point parallelization (with the -p option in 
run_lapw and .machines file), the job suspends at the lapw1 stage and does not 
produce any lapw1 output (such as case.vector_* files) or error messages.
• Terminating the job and running the command “x lapw1 -p” reproduces the 
symptom. Checking active processes in the compute node while the “x lapw1 -p” 
command is on does now show any lapw1 jobs running, except the signature of 
suspended lapw1para script.
• Removing the -p option and running in serial or using OpenMP multithreads 
work totally OK.

Further info. on my system:

• Wien2k version: 19.1 (also unofficially tried with version 23, the same 
problem persists)
• System: Ubuntu 20.04 LTS
• Compiler, math library: Intel oneapi 2023 version, with intel icc, ifort, 
mpiifort, and MKL (lapack, blacs, scalapack).
• FFTW: FFTW3, compiled using intel compilers from source (ver. 3.3.8)
• MPI: Intel MPI included in the Intel oneapi package, and with MPI_REMOTE = 0
    • Tried both using / not using mpi parallelization. The same lapw1 
suspension persists.

My .machines file looks like below (for a 4 core test job):
----
granularity:1
1:localhost
1:localhost
1:localhost
1:localhost
extrafine:1
----

I checked that, after running x lapw1 -p, a list of case.klist_* files and 
lapw1_*.def files are created in the working directory (and also “.machine* 
files). Running each of k-divided case using lapw1 (for example, using commands 
like “lapw1 lapw1_1.def”) works fine and creates case.vector_* files correctly. 
Strangely, actual "x lapw1 -p" (or “lapw1para_lapw lapw1.def”) does not enter 
the lapw1-running stage and suspends somewhere before that.

Because this suspension does not create any error or other messages, I have no 
idea on how to solve this issue. Currently what I tried are as follows:

• Recompiling wien2k without any mpi-related options (which means, even with 
setting MPI_REMOTE to be 1)
• Tuning DELAY and SLEEPY in lapw1para
• Running the parallel job on a local storage (not on a NFS storage)
• As mentioned above, using newer wien2k version 23 (just as a test purpose! I 
am not producing any scientific results with that)
• Removing fftw3. But this should not matter, because lapw1 does not seem to 
use fftw

which all were not successful in rectifying the issue.

I tried searching the previous wien2k mailing list, I might missed, but I 
couldn’t find any issue similar to mine. Any of your comments will be highly 
appreciated!

Best regards,
Heung-Sik

---
Heung-Sik Kim
Assistant Professor
Department of Physics
Kangwon National University
email: heungsi...@kangwon.ac.kr
https://sites.google.com/view/heungsikim/
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to