Roland Schulz wrote:
Justin,
I think the interaction kernel is not OK on your PowerPC machine. I
assume that from: 1) The force seems to be zero (minimization output).
2) When you use the all-to-all kernel which is not available for the
powerpc kernel, it automatically falls back to the C kernel and then it
works.
Sounds about right.
What is the kernel you are using? It should say in the log file. Look
for: "Configuring single precision IBM Power6-specific Fortran kernels"
or "Testing Altivec/VMX support"
I'm not finding either in the config.log - weird?
You can also look in the config.h whether GMX_POWER6
and/or GMX_PPC_ALTIVEC is set. I suggest you try to compile with
one/both of them deactivated and see whether that solves it. This will
make it slower too. Thus if this is indeed the problem, you will
probably want to figure out why the fastest kernel doesn't work
correctly to get good performance.
It looks like GMX_PPC_ALTIVEC is set. I suppose I could re-compile with this
turned off.
Here's what's even weirder. The problematic version was compiled using the
standard autoconf procedure. If I use a CMake-compiled version, the energy
minimization runs fine, giving the same results (energy and force) as the two
systems I know are good. So I guess there's something wrong with the way
autoconf installed Gromacs. Perhaps this isn't of concern since Gromacs will
require CMake in subsequent releases, but I figure I should at least report it
in case it affects anyone else.
If I may tack one more question on here, I'm wondering why my CMake installation
doesn't actually appear to be using MPI. I get the right result, but the
problem is, I get a .log, .edr, and .trr for every processor that's being used,
as if each processor is being given its own job and not distributing the work.
Here's how I compiled my MPI mdrun, version 4.5.1:
cmake ../gromacs-4.5.1
-DFFTW3F_LIBRARIES=/home/rdiv1001/fftw-3.0.1-osx/lib/libfftw3f.a
-DFFTW3F_INCLUDE_DIR=/home/rdiv1001/fftw-3.0.1-osx/include/
-DCMAKE_INSTALL_PREFIX=/home/rdiv1001/gromacs-4.5_cmake-osx
-DGMX_BINARY_SUFFIX=_4.5_cmake_mpi -DGMX_THREADS=OFF -DBUILD_SHARED_LIBS=OFF
-DGMX_X11=OFF -DGMX_MPI=ON
-DMPI_COMPILER=/home/rdiv1001/compilers/openmpi-1.2.3-osx/bin/mpicxx
-DMPI_INCLUDE_PATH=/home/rdiv1001/compilers/openmpi-1.2.3-osx/include
$ make mdrun
$ make install-mdrun
Is there anything obviously wrong with those commands? Is there any way I
should know (before actually using mdrun) whether or not I've done things right?
-Justin
Roland
On Mon, Sep 27, 2010 at 4:59 PM, Justin A. Lemkul <[email protected]
<mailto:[email protected]>> wrote:
Hi All,
I'm hoping I might get some tips in tracking down the source of an
issue that appears to be hardware-specific, leading to crashes in my
system. The failures are occurring on our supercomputer (Mac OSX
10.3, PowerPC). Running the same .tpr file on my laptop (Mac OSX
10.5.8, Intel Core2Duo) and on another workstation (Ubuntu 10.04,
AMD64) produce identical results. I suspect the problem stems from
unsuccessful energy minimization, which then leads to a crash when
running full MD. All jobs were run in parallel on two cores. The
supercomputer does not support threading, so MPI is invoked using
MPICH-1.2.5 (native MPI implementation on the cluster).
Details as follows:
EM md.log file: successful run (Intel Core2Duo or AMD64)
Steepest Descents converged to Fmax < 1000 in 7 steps
Potential Energy = -4.8878180e+04
Maximum force = 8.7791553e+02 on atom 5440
Norm of force = 1.1781271e+02
EM md.log file: unsuccessful run (PowerPC)
Steepest Descents converged to Fmax < 1000 in 1 steps
Potential Energy = -2.4873273e+04
Maximum force = 0.0000000e+00 on atom 0
Norm of force = nan
MD invoked from the minimized structure generated on my laptop or
AMD64 runs successfully (at least for a few hundred steps in my
test), but the MD on the PowerPC cluster fails immediately:
Step Time Lambda
0 0.00000 0.00000
Energies (kJ/mol)
U-B Proper Dih. Improper Dih. CMAP Dih.GB
Polarization
7.93559e+03 9.34958e+03 2.24036e+02 -2.47750e+03
-7.83599e+04
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR)
Potential
7.70042e+03 9.94520e+04 -1.17168e+04 -5.79783e+04
-2.55780e+04
Kinetic En. Total Energy Temperature Pressure (bar)
Constr. rmsd
nan nan nan 0.00000e+00
nan
Constr.2 rmsd
nan
DD step 9 load imb.: force 3.0%
-------------------------------------------------------
Program mdrun_4.5.1_mpi, VERSION 4.5.1
Source code file: nsgrid.c, line: 601
Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or
parameter
errors that give particles very high velocities you might end up
with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the
potential
energy seems reasonable before trying again.
Variable ind has value 7131. It should have been within [ 0 .. 7131 ]
For more information and tips for troubleshooting, please check the
GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
It seems as if the crash really shouldn't be happening, if the value
range is inclusive.
Running with all-vs-all kernels works, but the performance is
horrendously slow (<300 ps per day for a 7131-atom system) so I am
attempting to use long cutoffs (2.0 nm) as others on the list have
suggested.
Details of the installations and .mdp files are appended below.
-Justin
=== em.mdp ===
; Run parameters
integrator = steep ; EM
emstep = 0.005
emtol = 1000
nsteps = 50000
nstcomm = 1
comm_mode = angular ; non-periodic system
; Bond parameters
constraint_algorithm = lincs
constraints = all-bonds
continuation = no ; starting up
; required cutoffs for implicit
nstlist = 1
ns_type = grid
rlist = 2.0
rcoulomb = 2.0
rvdw = 2.0
; cutoffs required for qq and vdw
coulombtype = cut-off
vdwtype = cut-off
; temperature coupling
tcoupl = no
; Pressure coupling is off
Pcoupl = no
; Periodic boundary conditions are off for implicit
pbc = no
; Settings for implicit solvent
implicit_solvent = GBSA
gb_algorithm = OBC
rgbradii = 2.0
=== md.mdp ===
; Run parameters
integrator = sd ; velocity Langevin dynamics
dt = 0.002
nsteps = 2500000 ; 5000 ps (5 ns)
nstcomm = 1
comm_mode = angular ; non-periodic system
; Output parameters
nstxout = 0 ; nst[xvf]out = 0 to suppress
useless .trr output
nstvout = 0
nstfout = 0
nstlog = 5000 ; 10 ps
nstenergy = 5000 ; 10 ps
nstxtcout = 5000 ; 10 ps
; Bond parameters
constraint_algorithm = lincs
constraints = all-bonds
continuation = no ; starting up
; required cutoffs for implicit
nstlist = 10
ns_type = grid
rlist = 2.0
rcoulomb = 2.0
rvdw = 2.0
; cutoffs required for qq and vdw
coulombtype = cut-off
vdwtype = cut-off
; temperature coupling
tc_grps = System
tau_t = 1.0 ; inverse friction coefficient for Langevin
(ps^-1)
ref_t = 310
; Pressure coupling is off
Pcoupl = no
; Generate velocities is on
gen_vel = yes
gen_temp = 310
gen_seed = 173529
; Periodic boundary conditions are off for implicit
pbc = no
; Free energy must be off to use all-vs-all kernels
; default, but just for the sake of being pedantic
free_energy = no
; Settings for implicit solvent
implicit_solvent = GBSA
gb_algorithm = OBC
rgbradii = 2.0
=== Installation commands for the cluster ===
$ ./configure --prefix=/home/rdiv1001/gromacs-4.5
CPPFLAGS="-I/home/rdiv1001/fftw-3.0.1-osx/include"
LDFLAGS="-L/home/rdiv1001/fftw-3.0.1-osx/lib" --disable-threads
--without-x --program-suffix=_4.5.1_s
$ make
$ make install
$ make distclean
$ ./configure --prefix=/home/rdiv1001/gromacs-4.5
CPPFLAGS="-I/home/rdiv1001/fftw-3.0.1-osx/include"
LDFLAGS="-L/home/rdiv1001/fftw-3.0.1-osx/lib" --disable-threads
--without-x --program-suffix=_4.5.1_mpi --enable-mpi
CXXCPP="/nfs/compilers/mpich-1.2.5/bin/mpicxx -E"
$ make mdrun
$ make install-mdrun
--
========================================
Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu <http://vt.edu> | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
--
gmx-users mailing list [email protected]
<mailto:[email protected]>
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the www
interface or send it to [email protected]
<mailto:[email protected]>.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov <http://cmb.ornl.gov>
865-241-1537, ORNL PO BOX 2008 MS6309
--
========================================
Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
--
gmx-users mailing list [email protected]
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [email protected].
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists