Thank you Berk, I am still getting an error when I try with MPI compiled gromacs 4.6.1 and -np set as you suggested.
I ran like this: mpirun -np 6 /nics/b/home/cneale/exe/gromacs-4.6.1_cuda/exec2/bin/mdrun_mpi -notunepme -deffnm md3 -dlb yes -npme -1 -cpt 60 -maxh 0.1 -cpi md3.cpt -nsteps 5000000000 -pin on Here is the .log file output: Log file opened on Thu Apr 25 10:24:55 2013 Host: kfs064 pid: 38106 nodeid: 0 nnodes: 6 Gromacs version: VERSION 4.6.1 Precision: single Memory model: 64 bit MPI library: MPI OpenMP support: enabled GPU support: enabled invsqrt routine: gmx_software_invsqrt(x) CPU acceleration: AVX_256 FFT library: fftw-3.3.3-sse2 Large file support: enabled RDTSCP usage: enabled Built on: Tue Apr 23 12:43:12 EDT 2013 Built by: [email protected] [CMAKE] Build OS/arch: Linux 2.6.32-220.4.1.el6.x86_64 x86_64 Build CPU vendor: GenuineIntel Build CPU brand: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Build CPU family: 6 Model: 45 Stepping: 7 Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic C compiler: /opt/intel/composer_xe_2011_sp1.11.339/bin/intel64/icc Intel icc (ICC) 12.1.5 20120612 C compiler flags: -mavx -std=gnu99 -Wall -ip -funroll-all-loops -O3 -DNDEBUG C++ compiler: /opt/intel/composer_xe_2011_sp1.11.339/bin/intel64/icpc Intel icpc (ICC) 12.1.5 20120612 C++ compiler flags: -mavx -Wall -ip -funroll-all-loops -O3 -DNDEBUG CUDA compiler: nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2012 NVIDIA Corporation;Built on Thu_Apr__5_00:24:31_PDT_2012;Cuda compilation tools, release 4.2, V0.2.1221 CUDA driver: 5.0 CUDA runtime: 4.20 :-) G R O M A C S (-: Groningen Machine for Chemical Simulation :-) VERSION 4.6.1 (-: Contributions from Mark Abraham, Emile Apol, Rossen Apostolov, Herman J.C. Berendsen, Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, Gerrit Groenhof, Christoph Junghans, Peter Kasson, Carsten Kutzner, Per Larsson, Pieter Meulenhoff, Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, Michael Shirts, Alfons Sijbers, Peter Tieleman, Berk Hess, David van der Spoel, and Erik Lindahl. Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2012,2013, The GROMACS development team at Uppsala University & The Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. :-) /nics/b/home/cneale/exe/gromacs-4.6.1_cuda/exec2/bin/mdrun_mpi (-: ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation J. Chem. Theory Comput. 4 (2008) pp. 435-447 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C. Berendsen GROMACS: Fast, Flexible and Free J. Comp. Chem. 26 (2005) pp. 1701-1719 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ E. Lindahl and B. Hess and D. van der Spoel GROMACS 3.0: A package for molecular simulation and trajectory analysis J. Mol. Mod. 7 (2001) pp. 306-317 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ H. J. C. Berendsen, D. van der Spoel and R. van Drunen GROMACS: A message-passing parallel molecular dynamics implementation Comp. Phys. Comm. 91 (1995) pp. 43-56 -------- -------- --- Thank You --- -------- -------- For optimal performance with a GPU nstlist (now 10) should be larger. The optimum depends on your CPU and GPU resources. You might want to try several nstlist values. Can not increase nstlist for GPU run because verlet-buffer-drift is not set or used Input Parameters: integrator = sd nsteps = 5000000 init-step = 0 cutoff-scheme = Verlet ns_type = Grid nstlist = 10 ndelta = 2 nstcomm = 100 comm-mode = Linear nstlog = 0 nstxout = 5000000 nstvout = 5000000 nstfout = 5000000 nstcalcenergy = 100 nstenergy = 50000 nstxtcout = 50000 init-t = 0 delta-t = 0.002 xtcprec = 1000 fourierspacing = 0.12 nkx = 64 nky = 64 nkz = 80 pme-order = 4 ewald-rtol = 1e-05 ewald-geometry = 0 epsilon-surface = 0 optimize-fft = TRUE ePBC = xyz bPeriodicMols = FALSE bContinuation = FALSE bShakeSOR = FALSE etc = No bPrintNHChains = FALSE nsttcouple = -1 epc = Berendsen epctype = Semiisotropic nstpcouple = 10 tau-p = 4 ref-p (3x3): ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00} ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00} ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00} compress (3x3): compress[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00} compress[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00} compress[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05} refcoord-scaling = No posres-com (3): posres-com[0]= 0.00000e+00 posres-com[1]= 0.00000e+00 posres-com[2]= 0.00000e+00 posres-comB (3): posres-comB[0]= 0.00000e+00 posres-comB[1]= 0.00000e+00 posres-comB[2]= 0.00000e+00 verlet-buffer-drift = -1 rlist = 1 rlistlong = 1 nstcalclr = 10 rtpi = 0.05 coulombtype = PME coulomb-modifier = Potential-shift rcoulomb-switch = 0 rcoulomb = 1 vdwtype = Cut-off vdw-modifier = Potential-shift rvdw-switch = 0 rvdw = 1 epsilon-r = 1 epsilon-rf = inf tabext = 1 implicit-solvent = No gb-algorithm = Still gb-epsilon-solvent = 80 nstgbradii = 1 rgbradii = 1 gb-saltconc = 0 gb-obc-alpha = 1 gb-obc-beta = 0.8 gb-obc-gamma = 4.85 gb-dielectric-offset = 0.009 sa-algorithm = Ace-approximation sa-surface-tension = 2.05016 DispCorr = EnerPres bSimTemp = FALSE free-energy = no nwall = 0 wall-type = 9-3 wall-atomtype[0] = -1 wall-atomtype[1] = -1 wall-density[0] = 0 wall-density[1] = 0 wall-ewald-zfac = 3 pull = no rotation = FALSE disre = No disre-weighting = Conservative disre-mixed = FALSE dr-fc = 1000 dr-tau = 0 nstdisreout = 100 orires-fc = 0 orires-tau = 0 nstorireout = 100 dihre-fc = 0 em-stepsize = 0.01 em-tol = 10 niter = 20 fc-stepsize = 0 nstcgsteep = 1000 nbfgscorr = 10 ConstAlg = Lincs shake-tol = 0.0001 lincs-order = 6 lincs-warnangle = 30 lincs-iter = 1 bd-fric = 0 ld-seed = 29660 cos-accel = 0 deform (3x3): deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} adress = FALSE userint1 = 0 userint2 = 0 userint3 = 0 userint4 = 0 userreal1 = 0 userreal2 = 0 userreal3 = 0 userreal4 = 0 grpopts: nrdf: 106748 ref-t: 310 tau-t: 1 anneal: No ann-npoints: 0 acc: 0 0 0 nfreeze: N N N energygrp-flags[ 0]: 0 efield-x: n = 0 efield-xt: n = 0 efield-y: n = 0 efield-yt: n = 0 efield-z: n = 0 efield-zt: n = 0 bQMMM = FALSE QMconstraints = 0 QMMMscheme = 0 scalefactor = 1 qm-opts: ngQM = 0 Overriding nsteps with value passed on the command line: 705032704 steps, 1410065.408 ps Initializing Domain Decomposition on 6 nodes Dynamic load balancing: yes Will sort the charge groups at every domain (re)decomposition Initial maximum inter charge-group distances: two-body bonded interactions: 0.431 nm, LJ-14, atoms 101 108 multi-body bonded interactions: 0.431 nm, Proper Dih., atoms 101 108 Minimum cell size due to bonded interactions: 0.475 nm Maximum distance for 7 constraints, at 120 deg. angles, all-trans: 1.175 nm Estimated maximum distance required for P-LINCS: 1.175 nm This distance will limit the DD cell size, you can override this with -rcon Using 0 separate PME nodes, per user request Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25 Optimizing the DD grid for 6 cells with a minimum initial size of 1.469 nm The maximum allowed number of cells is: X 5 Y 5 Z 6 Domain decomposition grid 3 x 1 x 2, separate PME nodes 0 PME domain decomposition: 6 x 1 x 1 Domain decomposition nodeid 0, coordinates 0 0 0 Using 6 MPI processes Using 2 OpenMP threads per MPI process Detecting CPU-specific acceleration. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Family: 6 Model: 45 Stepping: 7 Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Acceleration most likely to fit this hardware: AVX_256 Acceleration selected at GROMACS compile time: AVX_256 3 GPUs detected on host kfs064: #0: NVIDIA Tesla M2090, compute cap.: 2.0, ECC: yes, stat: compatible #1: NVIDIA Tesla M2090, compute cap.: 2.0, ECC: yes, stat: compatible #2: NVIDIA Tesla M2090, compute cap.: 2.0, ECC: yes, stat: compatible ------------------------------------------------------- Program mdrun_mpi, VERSION 4.6.1 Source code file: /nics/b/home/cneale/exe/gromacs-4.6.1_cuda/source/src/gmxlib/gmx_detect_hardware.c, line: 356 Fatal error: Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node. mdrun_mpi was started with 6 PP MPI processes per node, but only 3 GPUs were detected. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors ------------------------------------------------------- Thanx for Using GROMACS - Have a Nice Day ################################################### ################################################### ################################################### And here is the stderr output: :-) G R O M A C S (-: Groningen Machine for Chemical Simulation :-) VERSION 4.6.1 (-: Contributions from Mark Abraham, Emile Apol, Rossen Apostolov, Herman J.C. Berendsen, Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, Gerrit Groenhof, Christoph Junghans, Peter Kasson, Carsten Kutzner, Per Larsson, Pieter Meulenhoff, Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, Michael Shirts, Alfons Sijbers, Peter Tieleman, Berk Hess, David van der Spoel, and Erik Lindahl. Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2012,2013, The GROMACS development team at Uppsala University & The Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. :-) /nics/b/home/cneale/exe/gromacs-4.6.1_cuda/exec2/bin/mdrun_mpi (-: Option Filename Type Description ------------------------------------------------------------ -s md3.tpr Input Run input file: tpr tpb tpa -o md3.trr Output Full precision trajectory: trr trj cpt -x md3.xtc Output, Opt. Compressed trajectory (portable xdr format) -cpi md3.cpt Input, Opt! Checkpoint file -cpo md3.cpt Output, Opt. Checkpoint file -c md3.gro Output Structure file: gro g96 pdb etc. -e md3.edr Output Energy file -g md3.log Output Log file -dhdl md3.xvg Output, Opt. xvgr/xmgr file -field md3.xvg Output, Opt. xvgr/xmgr file -table md3.xvg Input, Opt. xvgr/xmgr file -tabletf md3.xvg Input, Opt. xvgr/xmgr file -tablep md3.xvg Input, Opt. xvgr/xmgr file -tableb md3.xvg Input, Opt. xvgr/xmgr file -rerun md3.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt -tpi md3.xvg Output, Opt. xvgr/xmgr file -tpid md3.xvg Output, Opt. xvgr/xmgr file -ei md3.edi Input, Opt. ED sampling input -eo md3.xvg Output, Opt. xvgr/xmgr file -j md3.gct Input, Opt. General coupling stuff -jo md3.gct Output, Opt. General coupling stuff -ffout md3.xvg Output, Opt. xvgr/xmgr file -devout md3.xvg Output, Opt. xvgr/xmgr file -runav md3.xvg Output, Opt. xvgr/xmgr file -px md3.xvg Output, Opt. xvgr/xmgr file -pf md3.xvg Output, Opt. xvgr/xmgr file -ro md3.xvg Output, Opt. xvgr/xmgr file -ra md3.log Output, Opt. Log file -rs md3.log Output, Opt. Log file -rt md3.log Output, Opt. Log file -mtx md3.mtx Output, Opt. Hessian matrix -dn md3.ndx Output, Opt. Index file -multidir md3 Input, Opt., Mult. Run directory -membed md3.dat Input, Opt. Generic data file -mp md3.top Input, Opt. Topology file -mn md3.ndx Input, Opt. Index file Option Type Value Description ------------------------------------------------------ -[no]h bool no Print help info and quit -[no]version bool no Print version info and quit -nice int 0 Set the nicelevel -deffnm string md3 Set the default filename for all file options -xvg enum xmgrace xvg plot formatting: xmgrace, xmgr or none -[no]pd bool no Use particle decompostion -dd vector 0 0 0 Domain decomposition grid, 0 is optimize -ddorder enum interleave DD node order: interleave, pp_pme or cartesian -npme int -1 Number of separate nodes to be used for PME, -1 is guess -nt int 0 Total number of threads to start (0 is guess) -ntmpi int 0 Number of thread-MPI threads to start (0 is guess) -ntomp int 0 Number of OpenMP threads per MPI process/thread to start (0 is guess) -ntomp_pme int 0 Number of OpenMP threads per MPI process/thread to start (0 is -ntomp) -pin enum on Fix threads (or processes) to specific cores: auto, on or off -pinoffset int 0 The starting logical core number for pinning to cores; used to avoid pinning threads from different mdrun instances to the same core -pinstride int 0 Pinning distance in logical cores for threads, use 0 to minimize the number of threads per physical core -gpu_id string List of GPU id's to use -[no]ddcheck bool yes Check for all bonded interactions with DD -rdd real 0 The maximum distance for bonded interactions with DD (nm), 0 is determine from initial coordinates -rcon real 0 Maximum distance for P-LINCS (nm), 0 is estimate -dlb enum yes Dynamic load balancing (with DD): auto, no or yes -dds real 0.8 Minimum allowed dlb scaling of the DD cell size -gcom int -1 Global communication frequency -nb enum auto Calculate non-bonded interactions on: auto, cpu, gpu or gpu_cpu -[no]tunepme bool no Optimize PME load between PP/PME nodes or GPU/CPU -[no]testverlet bool no Test the Verlet non-bonded scheme -[no]v bool no Be loud and noisy -[no]compact bool yes Write a compact log file -[no]seppot bool no Write separate V and dVdl terms for each interaction type and node to the log file(s) -pforce real -1 Print all forces larger than this (kJ/mol nm) -[no]reprod bool no Try to avoid optimizations that affect binary reproducibility -cpt real 60 Checkpoint interval (minutes) -[no]cpnum bool no Keep and number checkpoint files -[no]append bool yes Append to previous output files when continuing from checkpoint instead of adding the simulation part number to all file names -nsteps int 705032704 Run this number of steps, overrides .mdp file option -maxh real 0.1 Terminate after 0.99 times this time (hours) -multi int 0 Do multiple simulations in parallel -replex int 0 Attempt replica exchange periodically with this period (steps) -nex int 0 Number of random exchanges to carry out each exchange interval (N^3 is one suggestion). -nex zero or not specified gives neighbor replica exchange. -reseed int -1 Seed for replica exchange, -1 is generate a seed -[no]ionize bool no Do a simulation including the effect of an X-Ray bombardment on your system Reading file md3.tpr, VERSION 4.6.1 (single precision) Can not increase nstlist for GPU run because verlet-buffer-drift is not set or used Overriding nsteps with value passed on the command line: 705032704 steps, 1410065.408 ps Using 6 MPI processes Using 2 OpenMP threads per MPI process 3 GPUs detected on host kfs064: #0: NVIDIA Tesla M2090, compute cap.: 2.0, ECC: yes, stat: compatible #1: NVIDIA Tesla M2090, compute cap.: 2.0, ECC: yes, stat: compatible #2: NVIDIA Tesla M2090, compute cap.: 2.0, ECC: yes, stat: compatible ------------------------------------------------------- Program mdrun_mpi, VERSION 4.6.1 Source code file: /nics/b/home/cneale/exe/gromacs-4.6.1_cuda/source/src/gmxlib/gmx_detect_hardware.c, line: 356 Fatal error: Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node. mdrun_mpi was started with 6 PP MPI processes per node, but only 3 GPUs were detected. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors ------------------------------------------------------- Thanx for Using GROMACS - Have a Nice Day Error on node 0, will try to stop all the nodes Halting parallel program mdrun_mpi on CPU 0 out of 6 gcq#6: Thanx for Using GROMACS - Have a Nice Day -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 38106 on node kfs064 exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- Thank you very much for you help, Chris. -- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

