Dear Gromacs users,

I'm trying to get the best performance out of a cluster which has for every 
node 8 CPUs and 1 GPU. To check it out, I run martini polarisable water system. 
Yet I have problems with it. While mdrun works for one MPI process, it crashes 
for 8 MPI processes and 1 GPU. Below is the whole sbatch script:


#SBATCH --ntasks=8

#SBATCH --ntasks-per-node=8

#SBATCH --cpus-per-task=1


export CRAY_CUDA_MPS=1

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

time aprun -B mdrun_mpi-gpu -gpu_id 00000000 -ntomp 1 -deffnm md -v -c md.gro

But it gives me just error about broken pipe:

_pmiu_daemon(SIGCHLD): [NID 02124] [c1-1c0s3n0] [Tue Jun  2 16:25:48 2015] PE 
RANK 2 exit signal Broken pipe

[NID 02124] 2015-06-02 16:25:48 Apid 4833499: initiated application termination

I also tried to use one MPI task and 8 OpenMP and other combinations, but 
always get the same error.

>From the core file of the mdrun crash I have the following:
> gdb mdrun core

#0  0x00002aaab2969885 in read_alias_file () from /lib64/libc.so.6

#1  0x00002aaab1612f65 in PMPI_Abort () from 
/opt/cray/lib64/libmpich_gnu_48.so.2

#2  0x00002aaaab909682 in gmx_abort (noderank=noderank@entry=4, 
nnodes=nnodes@entry=8, errorno=errorno@entry=-1) at 
/apps/santis/sandbox/lucamar/src/gromacs-4.6.3/src/gmxlib/network.c:518

#3  0x00002aaaab841dec in quit_gmx (msg=<optimized out>) at 
/apps/santis/sandbox/lucamar/src/gromacs-4.6.3/src/gmxlib/gmx_fatal.c:266

#4  0x00002aaaab842345 in _gmx_error (key=<optimized out>, msg=<optimized out>, 
file=0x2aaaabd6b010 <CSWTCH.6+40304> 
"/apps/santis/sandbox/lucamar/src/gromacs-4.6.3/src/gmxlib/gpu_utils/gpu_utils.cu",

    line=511) at 
/apps/santis/sandbox/lucamar/src/gromacs-4.6.3/src/gmxlib/gmx_fatal.c:774

#5  0x00002aaaabd274c5 in init_gpu () from 
/apps/daint/gromacs/4.6.3/gnu_481/lib/libgmx_mpi.so.8

#6  0x00002aaaab1875ba in pick_nbnxn_resources (hwinfo=0x667e70, 
bDoNonbonded=<optimized out>, bUseGPU=bUseGPU@entry=0x6be1e0, 
bEmulateGPU=bEmulateGPU@entry=0x7fffffff3c60, cr=<optimized out>,

    cr=<optimized out>, fp=<optimized out>) at 
/apps/santis/sandbox/lucamar/src/gromacs-4.6.3/src/mdlib/forcerec.c:1686

#7  0x00002aaaab19152b in init_nb_verlet (nbpu_opt=0x4481aa <cross_sec_h+4426> 
"auto", cr=0x65e4d0, fr=0x6bd140, ir=0x667810, nb_verlet=0x6bd328, fp=0x0)

    at /apps/santis/sandbox/lucamar/src/gromacs-4.6.3/src/mdlib/forcerec.c:1890

#8  init_forcerec (fp=0x0, oenv=oenv@entry=0x667780, fr=fr@entry=0x6bd140, 
fcd=fcd@entry=0xa45c70, ir=ir@entry=0x667810, mtop=mtop@entry=0x667c40, 
cr=cr@entry=0x65e4d0, box=box@entry=0x7fffffff3f20,

    bMolEpot=bMolEpot@entry=0, tabfn=0x6683d0 "dppc-gm1-2.xvg", 
tabafn=tabafn@entry=0x668410 "dppc-gm1-2.xvg", tabpfn=tabpfn@entry=0x668450 
"dppc-gm1-2.xvg", tabbfn=tabbfn@entry=0x668490 "dppc-gm1-2.xvg",

    nbpu_opt=nbpu_opt@entry=0x4481aa <cross_sec_h+4426> "auto", 
bNoSolvOpt=bNoSolvOpt@entry=0, print_force=print_force@entry=-1) at 
/apps/santis/sandbox/lucamar/src/gromacs-4.6.3/src/mdlib/forcerec.c:2890

#9  0x000000000040fef6 in mdrunner (hw_opt=hw_opt@entry=0x7fffffff59b0, 
fplog=0x0, cr=cr@entry=0x65e4d0, nfile=nfile@entry=36, 
fnm=fnm@entry=0x7fffffff5fc0, oenv=0x667780, bVerbose=bVerbose@entry=1,

    bCompact=bCompact@entry=1, nstglobalcomm=-1, 
ddxyz=ddxyz@entry=0x7fffffff5900, dd_node_order=dd_node_order@entry=1, 
rdd=<optimized out>, rconstr=<optimized out>,

    dddlb_opt=dddlb_opt@entry=0x4481aa <cross_sec_h+4426> "auto", 
dlb_scale=0.800000012, ddcsx=0x0, ddcsy=0x0, ddcsz=0x0, nbpu_opt=<optimized 
out>, nsteps_cmdline=-2, nstepout=100, resetstep=-1, nmultisim=0,

    repl_ex_nst=0, repl_ex_nex=0, repl_ex_seed=-1, pforce=-1, cpt_period=15, 
max_hours=-1, deviceOptions=deviceOptions@entry=0x4481f3 <cross_sec_h+4499> "", 
Flags=<optimized out>)

    at /apps/santis/sandbox/lucamar/src/gromacs-4.6.3/src/kernel/runner.c:1404

#10 0x000000000043b58d in cmain (argc=1, argv=0x667720) at 
/apps/santis/sandbox/lucamar/src/gromacs-4.6.3/src/kernel/mdrun.c:737


[1] 
https://www.acrc.a-star.edu.sg/docs/ASTAR%20GPU%20symposium-22th-Jan-2014.pdf


Best regards,

Kirill
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to [email protected].

Reply via email to