On Thu, May 9, 2019 at 10:01 PM Alex <nedoma...@gmail.com> wrote: > Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The > test procedure is simple, using slurm: > 1. Request an interactive session: > srun -N 1 -n 20 --pty > --partition=debug --time=1:00:00 --gres=gpu:1 bash > 2. Load CUDA library: module load cuda > 3. Run test batch. This starts with a CPU-only static EM, which, despite > the mdrun variables, runs on a single thread. Any help will be highly > appreciated. > > md.log below: > > GROMACS: gmx mdrun, version 2019.1 > Executable: /home/reida/ppc64le/stow/gromacs/bin/gmx > Data prefix: /home/reida/ppc64le/stow/gromacs > Working dir: /home/smolyan/gmx_test1 > Process ID: 115831 > Command line: > gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s > em.tpr -o traj.trr -g md.log -c after_em.pdb > > GROMACS version: 2019.1 > Precision: single > Memory model: 64 bit > MPI library: thread_mpi > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) > GPU support: CUDA > SIMD instructions: IBM_VSX > FFT library: fftw-3.3.8 > RDTSCP usage: disabled > TNG support: enabled > Hwloc support: hwloc-1.11.8 > Tracing support: disabled > C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1 > C compiler flags: -mcpu=power9 -mtune=power9 -mvsx -O2 -DNDEBUG > -funroll-all-loops -fexcess-precision=fast > C++ compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1 > C++ compiler flags: -mcpu=power9 -mtune=power9 -mvsx -std=c++11 -O2 > -DNDEBUG -funroll-all-loops -fexcess-precision=fast > CUDA compiler: /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda > compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on > Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0, > V10.0.130 > CUDA compiler > > flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; > > -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast; > CUDA driver: 10.10 > CUDA runtime: 10.0 > > > Running on 1 node with total 160 cores, 160 logical cores, 1 compatible GPU > Hardware detected: > CPU info: > Vendor: IBM > Brand: POWER9, altivec supported > Family: 0 Model: 0 Stepping: 0 > Features: vmx vsx > Hardware topology: Only logical processor count > GPU info: > Number of GPUs detected: 1 > #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: > compatible > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > > *SKIPPED* > > Input Parameters: > integrator = steep > tinit = 0 > dt = 0.001 > nsteps = 50000 > init-step = 0 > simulation-part = 1 > comm-mode = Linear > nstcomm = 100 > bd-fric = 0 > ld-seed = 1941752878 > emtol = 100 > emstep = 0.01 > niter = 20 > fcstep = 0 > nstcgsteep = 1000 > nbfgscorr = 10 > rtpi = 0.05 > nstxout = 0 > nstvout = 0 > nstfout = 0 > nstlog = 1000 > nstcalcenergy = 100 > nstenergy = 1000 > nstxout-compressed = 0 > compressed-x-precision = 1000 > cutoff-scheme = Verlet > nstlist = 1 > ns-type = Grid > pbc = xyz > periodic-molecules = true > verlet-buffer-tolerance = 0.005 > rlist = 1.2 > coulombtype = PME > coulomb-modifier = Potential-shift > rcoulomb-switch = 0 > rcoulomb = 1.2 > epsilon-r = 1 > epsilon-rf = inf > vdw-type = Cut-off > vdw-modifier = Potential-shift > rvdw-switch = 0 > rvdw = 1.2 > DispCorr = No > table-extension = 1 > fourierspacing = 0.12 > fourier-nx = 52 > fourier-ny = 52 > fourier-nz = 52 > pme-order = 4 > ewald-rtol = 1e-05 > ewald-rtol-lj = 0.001 > lj-pme-comb-rule = Geometric > ewald-geometry = 0 > epsilon-surface = 0 > tcoupl = No > nsttcouple = -1 > nh-chain-length = 0 > print-nose-hoover-chain-variables = false > pcoupl = No > pcoupltype = Isotropic > nstpcouple = -1 > tau-p = 1 > compressibility (3x3): > compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > ref-p (3x3): > ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > refcoord-scaling = No > posres-com (3): > posres-com[0]= 0.00000e+00 > posres-com[1]= 0.00000e+00 > posres-com[2]= 0.00000e+00 > posres-comB (3): > posres-comB[0]= 0.00000e+00 > posres-comB[1]= 0.00000e+00 > posres-comB[2]= 0.00000e+00 > QMMM = false > QMconstraints = 0 > QMMMscheme = 0 > MMChargeScaleFactor = 1 > qm-opts: > ngQM = 0 > constraint-algorithm = Lincs > continuation = false > Shake-SOR = false > shake-tol = 0.0001 > lincs-order = 4 > lincs-iter = 1 > lincs-warnangle = 30 > nwall = 0 > wall-type = 9-3 > wall-r-linpot = -1 > wall-atomtype[0] = -1 > wall-atomtype[1] = -1 > wall-density[0] = 0 > wall-density[1] = 0 > wall-ewald-zfac = 3 > pull = false > awh = false > rotation = false > interactiveMD = false > disre = No > disre-weighting = Conservative > disre-mixed = false > dr-fc = 1000 > dr-tau = 0 > nstdisreout = 100 > orire-fc = 0 > orire-tau = 0 > nstorireout = 100 > free-energy = no > cos-acceleration = 0 > deform (3x3): > deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > simulated-tempering = false > swapcoords = no > userint1 = 0 > userint2 = 0 > userint3 = 0 > userint4 = 0 > userreal1 = 0 > userreal2 = 0 > userreal3 = 0 > userreal4 = 0 > applied-forces: > electric-field: > x: > E0 = 0 > omega = 0 > t0 = 0 > sigma = 0 > y: > E0 = 0 > omega = 0 > t0 = 0 > sigma = 0 > z: > E0 = 0 > omega = 0 > t0 = 0 > sigma = 0 > grpopts: > nrdf: 47805 > ref-t: 0 > tau-t: 0 > annealing: No > annealing-npoints: 0 > acc: 0 0 0 > nfreeze: N N N > energygrp-flags[ 0]: 0 > > > Initializing Domain Decomposition on 4 ranks > NOTE: disabling dynamic load balancing as it is only supported with > dynamics, not with integrator 'steep'. > Dynamic load balancing: auto > Using update groups, nr 10529, average size 2.5 atoms, max. radius 0.078 nm > Minimum cell size due to atom displacement: 0.000 nm > NOTE: Periodic molecules are present in this system. Because of this, the > domain decomposition algorithm cannot easily determine the minimum cell > size that it requires for treating bonded interactions. Instead, domain > decomposition will assume that half the non-bonded cut-off will be a > suitable lower bound. > Minimum cell size due to bonded interactions: 0.678 nm > Using 0 separate PME ranks, as there are too few total > ranks for efficient splitting > Optimizing the DD grid for 4 cells with a minimum initial size of 0.678 nm > The maximum allowed number of cells is: X 8 Y 8 Z 8 > Domain decomposition grid 1 x 4 x 1, separate PME ranks 0 > PME domain decomposition: 1 x 4 x 1 > Domain decomposition rank 0, coordinates 0 0 0 > > The initial number of communication pulses is: Y 1 > The initial domain decomposition cell size is: Y 1.50 nm > > The maximum allowed distance for atom groups involved in interactions is: > non-bonded interactions 1.356 nm > two-body bonded interactions (-rdd) 1.356 nm > multi-body bonded interactions (-rdd) 1.356 nm > virtual site constructions (-rcon) 1.503 nm > > Using 4 MPI threads > Using 4 OpenMP threads per tMPI thread > > > Overriding thread affinity set outside gmx mdrun > > Pinning threads with a user-specified logical core stride of 2 > > NOTE: Thread affinity was not set. >
The threads are not pinned -- see above --, but why I can't say. I suggest: i) talk to your admins ii) try to tell the job scheduler to not set affinities and let mdrun set it. > System total charge: 0.000 > Will do PME sum in reciprocal space for electrostatic interactions. > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. > Pedersen > A smooth particle mesh Ewald method > J. Chem. Phys. 103 (1995) pp. 8577-8592 > -------- -------- --- Thank You --- -------- -------- > > Using a Gaussian width (1/beta) of 0.384195 nm for Ewald > Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -8.333e-06 > Initialized non-bonded Ewald correction tables, spacing: 1.02e-03 size: > 1176 > > Generated table with 1100 data points for 1-4 COUL. > Tabscale = 500 points/nm > Generated table with 1100 data points for 1-4 LJ6. > Tabscale = 500 points/nm > Generated table with 1100 data points for 1-4 LJ12. > Tabscale = 500 points/nm > > Using SIMD 4x4 nonbonded short-range kernels > > Using a 4x4 pair-list setup: > updated every 1 steps, buffer 0.000 nm, rlist 1.200 nm > > Using geometric Lennard-Jones combination rule > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > S. Miyamoto and P. A. Kollman > SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid > Water Models > J. Comp. Chem. 13 (1992) pp. 952-962 > -------- -------- --- Thank You --- -------- -------- > > > Linking all bonded interactions to atoms > There are 5407 inter charge-group virtual sites, > will an extra communication step for selected coordinates and forces > > > Note that activating steepest-descent energy minimization via the > integrator .mdp option and the command gmx mdrun may be available in a > different form in a future version of GROMACS, e.g. gmx minimize and an > .mdp option. > Initiating Steepest Descents > > Atom distribution over 4 domains: av 6687 stddev 134 min 6515 max 6792 > Started Steepest Descents on rank 0 Thu May 9 15:49:36 2019 > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.