Dear colleagues,

I'm trying to get the parallel version of WIEN2k 14.2 running on the institute's cluster, but I'm stuck now. I searched the mailing list for the error message I receive, but at first glance I did not find relevant postings.

This is the problem I face:
The serial version of WIEN2k is running fine (also when using qsub and the submission system). However, when trying to use the parallel version I get the following error when lapw1 is invoked in parallel mode:
"Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers."

Here the facts on hard- and software:
I'm using WIEN2k 14.2 to run the example TiC case from the UG as a test to see whether all works out fine. I compiled the sofware using the Intel Cluster Studio with the Intel Fortran Compiler 16.0, Intel C++ Compiler 16.0, Intel MKL 11.3 and Intel MPI.
I compiled it using FFTW3 and the MKL lapack, scalapack and blacs.

The cluster consists of 37 nodes with 20 CPUs each. It is running a CentOS Linux distribution and it is using SGE as submission system. In the submission script orte is used as the parallel environment.

This is what I've tried so far:
I don't know why it can't find things from the MKL as I export all my environmental variables using the submission script, which I also checked by printing them. As additional information I include the submission script (, which is still a bit messy as I adopted it from another program for testing reasons), the :parallel file with the error messages (which I renamed to .parallel so that Thunderbird on Windows can handle it), the output-file (TiC.o14540) and the .machines file.

I get the same error messages when using OpenMPI instead of the IntelMPI, but the output is a bit different (see TiC.o14541).

I did try different ways to export my environmental variables (to get rid of unnecessary stuff), also tried mpi as the parallel environment in the submission script instead of orte. I also tried to request a different number of slots,...
But nothing off this got me rid of this error message.

Has anybody experienced a similar problem or are there any hints what else I can check?

Thanks for your help.
All the best,

Dr.techn. Walid Hetaba

Fritz-Haber-Institut der Max-Planck-Gesellschaft
Department of Inorganic Chemistry
Faradayweg 4-6, 14195 Berlin, Germany
T: +49 30 8413-4412

#$ -S /bin/bash
#$ -hard
#$ -cwd
#$ -j y
#$ -m bea
#$ -l h_cpu=9000000
#$ -M
#$ -N TiC 
####$ -v MPIR_HOME=/share/apps/openmpi/bin
#$ -V
#$ -pe orte 24
# for orca NUMPROC must be aligned to "%pal nprocs 400 end" in input file
# and to the above statement -pe orte xxx
export NUMPROC=24

# use single threaded processes with open mpi
# on yfhix only ssh is supported
export RSH_COMMAND=ssh

export WIENROOT=/home/hetaba/WIEN2k
#export SCRATCH=/scratch/hetaba

# Remnants of original submission file
# orca input output file
# export INPUT_FILE=${JOB_NAME}.inp
# export OUTPUT_FILE=${JOB_NAME}.out

#-------------- MAIN PROGRAM NAME -------------
# export PROGRAM=/home/hetaba/ORCA/orca_3_0_3_linux_x86-64/orca
#-------------- Set Environment ---------------
# Do we need this when using -V ?
# export PATH=/share/apps/openmpi/bin:${PATH}
# This is needed to use OpenMPI instead of IntelMPI when using -V
# export LD_LIBRARY_PATH=/home/hetaba/FFTW3/lib:$LD_LIBRARY_PATH
# Maybe this is needed for FFTW3, check it later
# just to learn which SGE environment variables are available
echo " SGE_ROOT : ${SGE_ROOT}"
echo " SGE_O_HOME : ${SGE_O_HOME}"
echo " SGE_O_HOST : ${SGE_O_HOST}"
echo " SGE_O_MAIL : ${SGE_O_MAIL}"
echo " SGE_O_PATH : ${SGE_O_PATH}"
echo " SGE_O_SHELL : ${SGE_O_SHELL}"
echo " SGE_TASK_ID : ${SGE_TASK_ID}"
echo " HOME : ${HOME}"
echo " JOB_ID : ${JOB_ID}"
echo " JOB_NAME : ${JOB_NAME}"
echo " NHOSTS : ${NHOSTS}"
echo " NSLOTS : ${NSLOTS}"
echo " hosts : `cat ${PE_HOSTFILE}`"


#echo " TMPDIR : ${TMPDIR}"

#----- Generating .machines -----
echo " Generating .machines file"
# remove old file
rm .machines
# For details see WIEN2k User Guide p.77
# Load balancing and k-point distribution
echo "granularity:1" >> .machines
echo "extrafine:1" >> .machines
# nodes for parallel lapw1/2
for host in `cat $PE_HOSTFILE | awk '{print $1 ":" $2}'`; do
        echo "1:$host" >> .machines
# nodes for parallel lapw0
# deactivated these lines because problems with FFTW in lapw0
# have to deal with this later
# echo -n "lapw0:" >> .machines
# for host in `cat $PE_HOSTFILE | awk '{print $1 ":" $2}'`; do
#       echo -n "$host " >> .machines
# done
# Get a newline at the end of the file, don't know whether it's necessary
echo " " >> .machines
# For testing: print .machines
cat .machines

export MKLROOT=/share/apps/intel/compilers_and_libraries_2016.0.109/linux/mkl

# all jobs on yfhix should use the node local /scratch space
# its on a non backuped fast netapp store
# temp files should be stored there, !not on /home

#------ create scratch on all working nodes ------
#export  WORKDIR=/scratch/${USER}_${JOB_ID}
#export  CASEDIR=${JOB_NAME}
# Creating scratch with case-folder-name, required by WIEN2k
export  WORKDIR=/scratch/${USER}_${JOB_ID}/${JOB_NAME}
export  SCRATCH=${WORKDIR} # needed by WIEN2k
for host in `cat $PE_HOSTFILE | awk '{print $1}'`; do 
  echo " create scratch on $host "
  ssh $host mkdir -p ${WORKDIR}

# copy input files to the initial working node (mpi!)
echo " copy input files to ${HOSTNAME}:${WORKDIR}"
# second source needed to copy .machines and .machine0/1... and all other 
created hidden files

echo "Starting ${PROG_NAME} in $( pwd ) on `date`"
# Testing: which mpirun command is used
echo "mpirun: "
which mpirun
# run scf-cycle in parallel with charge convergence, Test-setup
run_lapw -p -cc 0.0001

echo "Your job ends on `date`"

# create dir in /home/'Directory where qsub was started'
mkdir ${SGE_O_WORKDIR}/${JOB_ID}
# copy output from scratch to this directory to separate output files

#------ remove scratch on all working nodes ------
for host in `cat $PE_HOSTFILE | awk '{print $1}'`; do 
  echo " remove scratch on $host "
  ssh $host rm -rf ${WORKDIR} 
# If job is aborted, do the scratch directories remain on the nodes ?
# Probably yes, as these lines will not be executed.
starting parallel lapw1 at Thu Apr 21 10:08:34 CEST 2016
     (23) Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.Intel 
MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.

0.009u 0.024s 0:00.59 3.3%      0+0k 5552+176io 17pf+0w
     (23) Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
0.008u 0.015s 0:01.35 0.7%      0+0k 616+40io 5pf+0w
     (1) Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.
0.011u 0.014s 0:00.36 5.5%      0+0k 0+168io 0pf+0w
   Summary of lapw1para:
   node-0-10.local       k=      user=   wallclock=
   node-0-8.local        k=      user=   wallclock=
<-  done at Thu Apr 21 10:08:37 CEST 2016
->  starting Fermi on node-0-10.local at Thu Apr 21 10:08:37 CEST 2016
**  LAPW2 crashed at Thu Apr 21 10:08:37 CEST 2016
**  check ERROR FILES!
 SGE_ROOT : /opt/gridengine
 SGE_JOB_SPOOL_DIR : /opt/gridengine/default/spool/node-0-10/active_jobs/14540.1
 SGE_O_HOME : /home/hetaba
 SGE_O_LOGNAME : hetaba
 SGE_O_MAIL : /var/spool/mail/hetaba
 SGE_O_SHELL : /bin/tcsh
 SGE_O_WORKDIR : /home/hetaba/Test/qsub-Test-1/TiC
 SGE_TASK_ID : undefined
 HOME : /home/hetaba
 HOSTNAME : node-0-10.local
 JOB_ID : 14540
 NSLOTS : 24
 hosts : node-0-10.local 20 normal.q@node-0-10.local 20
node-0-8.local 4 normal.q@node-0-8.local 20
 Generating .machines file
 create scratch on node-0-10.local 
 create scratch on node-0-8.local 
 copy input files to node-0-10.local:/scratch/hetaba_14540/TiC
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14479'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14480'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14481'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14482'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14483'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14484'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14485'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14486'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14487'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14488'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14489'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14490'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14491'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14492'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14493'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14494'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14502'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14503'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14507'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14509'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14510'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14511'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14512'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14513'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14514'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14515'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14516'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14517'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14519'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14529'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14535'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14536'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14537'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14538'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14539'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/Testing'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/Test-parallel'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/.'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/..'
Starting  in /scratch/hetaba_14540/TiC on Thu Apr 21 10:08:32 CEST 2016
TiC.scf1_1: No such file or directory.
grep: *scf1*: No such file or directory
FERMI - Error
cp: cannot stat `.in.tmp': No such file or directory

>   stop error
Your job ends on Thu Apr 21 10:08:37 CEST 2016
cp: omitting directory `/scratch/hetaba_14540/TiC/.'
cp: omitting directory `/scratch/hetaba_14540/TiC/..'
 remove scratch on node-0-10.local 
 remove scratch on node-0-8.local 
 SGE_ROOT : /opt/gridengine
 SGE_JOB_SPOOL_DIR : /opt/gridengine/default/spool/node-0-4/active_jobs/14541.1
 SGE_O_HOME : /home/hetaba
 SGE_O_LOGNAME : hetaba
 SGE_O_MAIL : /var/spool/mail/hetaba
 SGE_O_SHELL : /bin/tcsh
 SGE_O_WORKDIR : /home/hetaba/Test/qsub-Test-1/TiC
 SGE_TASK_ID : undefined
 HOME : /home/hetaba
 HOSTNAME : node-0-4.local
 JOB_ID : 14541
 NSLOTS : 24
 hosts : node-0-4.local 20 normal.q@node-0-4.local 20
node-0-12.local 4 normal.q@node-0-12.local 20
 Generating .machines file
 create scratch on node-0-4.local 
 create scratch on node-0-12.local 
 copy input files to node-0-4.local:/scratch/hetaba_14541/TiC
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14479'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14480'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14481'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14482'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14483'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14484'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14485'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14486'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14487'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14488'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14489'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14490'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14491'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14492'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14493'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14494'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14502'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14503'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14507'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14509'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14510'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14511'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14512'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14513'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14514'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14515'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14516'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14517'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14519'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14529'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14535'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14536'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14537'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14538'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14539'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/14540'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/Testing'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/Test-parallel'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/.'
cp: omitting directory `/home/hetaba/Test/qsub-Test-1/TiC/..'
Starting  in /scratch/hetaba_14541/TiC on Thu Apr 21 10:17:32 CEST 2016
[node-0-4.local:16439] [[16013,0],0] ORTE_ERROR_LOG: Take next option in file 
base/rmaps_base_support_fns.c at line 63
[node-0-4.local:16420] [[16030,0],0] ORTE_ERROR_LOG: Take next option in file 
base/rmaps_base_support_fns.c at line 63
[node-0-4.local:16439] [[16013,0],0] ORTE_ERROR_LOG: Take next option in file 
base/rmaps_base_support_fns.c at line 63
[node-0-4.local:16420] [[16030,0],0] ORTE_ERROR_LOG: Take next option in file 
base/rmaps_base_support_fns.c at line 63
[node-0-4.local:16491] [[16081,0],0] ORTE_ERROR_LOG: Take next option in file 
base/rmaps_base_support_fns.c at line 63
[node-0-4.local:16491] [[16081,0],0] ORTE_ERROR_LOG: Take next option in file 
base/rmaps_base_support_fns.c at line 63
w2k_dispatch_signal(): received: Terminated
application called MPI_Abort(MPI_COMM_WORLD, 33206400) - process 0
w2k_dispatch_signal(): received: Terminated
application called MPI_Abort(MPI_COMM_WORLD, 1105617995) - process 0
w2k_dispatch_signal(): received: Terminated
application called MPI_Abort(MPI_COMM_WORLD, 1337554352) - process 0
w2k_dispatch_signal(): received: Terminated
application called MPI_Abort(MPI_COMM_WORLD, 89250992) - process 0
w2k_dispatch_signal(): received: Terminated
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
w2k_dispatch_signal(): received: Terminated
application called MPI_Abort(MPI_COMM_WORLD, 22114432) - process 0
TiC.scf1_1: No such file or directory.
grep: *scf1*: No such file or directory
FERMI - Error
cp: cannot stat `.in.tmp': No such file or directory

>   stop error
Your job ends on Thu Apr 21 10:17:37 CEST 2016
cp: omitting directory `/scratch/hetaba_14541/TiC/.'
cp: omitting directory `/scratch/hetaba_14541/TiC/..'
 remove scratch on node-0-4.local 
 remove scratch on node-0-12.local 
Wien mailing list

Reply via email to