----- Original Message ----- From: "Justin A. Lemkul" <[email protected]> Date: Tuesday, September 28, 2010 11:11 Subject: Re: [gmx-users] Hardware-specific crash with 4.5.1 To: Gromacs Users' List <[email protected]>
> > > Roland Schulz wrote: > >Justin, > > > >I think the interaction kernel is not OK on your PowerPC > machine. I assume that from: 1) The force seems to be zero > (minimization output). 2) When you use the all-to-all kernel > which is not available for the powerpc kernel, it automatically > falls back to the C kernel and then it works. > > > > Sounds about right. > > >What is the kernel you are using? It should say in the log > file. Look for: "Configuring single precision IBM Power6- > specific Fortran kernels" or "Testing Altivec/VMX support" > > > > I'm not finding either in the config.log - weird? You were meant to look in the mdrun.log for runtime confirmation of what kernels GROMACS has decided to use. > >You can also look in the config.h whether GMX_POWER6 > and/or GMX_PPC_ALTIVEC is set. I suggest you try to compile with > one/both of them deactivated and see whether that solves it. > This will make it slower too. Thus if this is indeed the > problem, you will probably want to figure out why the fastest > kernel doesn't work correctly to get good performance. > > > > It looks like GMX_PPC_ALTIVEC is set. I suppose I could re- > compile with this turned off. This is supposed to be fine for Mac, as I understand. > Here's what's even weirder. The problematic version was > compiled using the standard autoconf procedure. If I use a > CMake-compiled version, the energy minimization runs fine, > giving the same results (energy and force) as the two systems I > know are good. So I guess there's something wrong with the > way autoconf installed Gromacs. Perhaps this isn't of > concern since Gromacs will require CMake in subsequent releases, > but I figure I should at least report it in case it affects > anyone else. > > If I may tack one more question on here, I'm wondering why my > CMake installation doesn't actually appear to be using > MPI. I get the right result, but the problem is, I get a > .log, .edr, and .trr for every processor that's being used, as > if each processor is being given its own job and not > distributing the work. Here's how I compiled my MPI mdrun, > version 4.5.1: At the start and end of the .log files you should get indicators about how many MPI processes were actually being used. > cmake ../gromacs-4.5.1 -DFFTW3F_LIBRARIES=/home/rdiv1001/fftw- > 3.0.1-osx/lib/libfftw3f.a - > DFFTW3F_INCLUDE_DIR=/home/rdiv1001/fftw-3.0.1-osx/include/ - > DCMAKE_INSTALL_PREFIX=/home/rdiv1001/gromacs-4.5_cmake-osx - > DGMX_BINARY_SUFFIX=_4.5_cmake_mpi -DGMX_THREADS=OFF - > DBUILD_SHARED_LIBS=OFF -DGMX_X11=OFF -DGMX_MPI=ON - > DMPI_COMPILER=/home/rdiv1001/compilers/openmpi-1.2.3- > osx/bin/mpicxx - > DMPI_INCLUDE_PATH=/home/rdiv1001/compilers/openmpi-1.2.3-osx/include > > $ make mdrun > > $ make install-mdrun > > Is there anything obviously wrong with those commands? Is > there any way I should know (before actually using mdrun) > whether or not I've done things right? I think there ought to be, but IMO not enough preparation and testing has gone into the CMake switch for it to be usable. Mark > -Justin > > >Roland > > > > > >On Mon, Sep 27, 2010 at 4:59 PM, Justin A. Lemkul > <[email protected] <mailto:[email protected]>> wrote: > > > > > > Hi All, > > > > I'm hoping I might get some tips in tracking > down the source of an > > issue that appears to be hardware-specific, > leading to crashes in my > > system. The failures are occurring on > our supercomputer (Mac OSX > > 10.3, PowerPC). Running the same .tpr > file on my laptop (Mac OSX > > 10.5.8, Intel Core2Duo) and on another > workstation (Ubuntu 10.04, > > AMD64) produce identical results. I > suspect the problem stems from > > unsuccessful energy minimization, which then > leads to a crash when > > running full MD. All jobs were run in > parallel on two cores. The > > supercomputer does not support threading, so > MPI is invoked using > > MPICH-1.2.5 (native MPI implementation on > the cluster). > > > > > > Details as follows: > > > > EM md.log file: successful run (Intel > Core2Duo or AMD64) > > > > Steepest Descents converged to Fmax < > 1000 in 7 steps > > Potential Energy = -4.8878180e+04 > > Maximum force > = 8.7791553e+02 on atom 5440 > > Norm of force > = 1.1781271e+02 > > > > > > EM md.log file: unsuccessful run (PowerPC) > > > > Steepest Descents converged to Fmax < > 1000 in 1 steps > > Potential Energy = -2.4873273e+04 > > Maximum force > = 0.0000000e+00 on atom 0 > > Norm of force > = nan > > > > > > MD invoked from the minimized structure > generated on my laptop or > > AMD64 runs successfully (at least for a few > hundred steps in my > > test), but the MD on the PowerPC cluster > fails immediately: > > > > Step Time Lambda > > 0 0.00000 0.00000 > > > > Energies (kJ/mol) > > U-B Proper Dih. Improper Dih. CMAP Dih.GB > > Polarization > > > 7.93559e+03 9.34958e+03 > 2.24036e+02 - > 2.47750e+03 -7.83599e+04 > > LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) > > Potential > > > 7.70042e+03 9.94520e+04 - > 1.17168e+04 - > 5.79783e+04 -2.55780e+04 > > Kinetic En. > Total Energy Temperature Pressure > (bar) Constr. rmsd > > nan nan nan 0.00000e+00 > > nan > > Constr.2 rmsd > > nan > > > > DD step 9 load imb.: force 3.0% > > > > > > --------------------------------------------- > ---------- > > Program mdrun_4.5.1_mpi, VERSION 4.5.1 > > Source code file: nsgrid.c, line: 601 > > > > Range checking error: > > Explanation: During neighborsearching, we > assign each particle to a grid > > based on its coordinates. If your system > contains collisions or > > parameter > > errors that give particles very high > velocities you might end up > > with some > > coordinates being +-Infinity or NaN (not-a- > number). Obviously, we cannot > > put these on a grid, so this is usually > where we detect those errors. > > Make sure your system is properly energy- > minimized and that the > > potential > > energy seems reasonable before trying again. > > Variable ind has value 7131. It should have > been within [ 0 .. 7131 ] > > > > For more information and tips for > troubleshooting, please check the > > GROMACS > > website at > http://www.gromacs.org/Documentation/Errors> -- > ----------------------------------------------------- > > > > It seems as if the crash really shouldn't be > happening, if the value > > range is inclusive. > > > > Running with all-vs-all kernels works, but > the performance is > > horrendously slow (<300 ps per day for a > 7131-atom system) so I am > > attempting to use long cutoffs (2.0 nm) as > others on the list have > > suggested. > > > > Details of the installations and .mdp files > are appended below. > > > > -Justin > > > > === em.mdp === > > ; Run parameters > > integrator = > steep ; EM > > emstep = 0.005 > > emtol = 1000 > > nsteps = 50000 > > > nstcomm = 1 > > comm_mode = > angular ; non-periodic system > > ; Bond parameters > > constraint_algorithm = lincs > > > constraints = all-bonds > > continuation = > no ; starting up > > ; required cutoffs for implicit > > > nstlist = 1 > > > ns_type = grid > > > rlist = 2.0 > > > rcoulomb = 2.0 > > > rvdw = 2.0 > > ; cutoffs required for qq and vdw > > coulombtype = cut-off > > vdwtype = cut-off > > ; temperature coupling > > > tcoupl = no > > ; Pressure coupling is off > > > Pcoupl = no > > ; Periodic boundary conditions are off for > implicit> > pbc = no > > ; Settings for implicit solvent > > implicit_solvent = GBSA > > > gb_algorithm = OBC > > > rgbradii = 2.0 > > > > > > === md.mdp === > > > > ; Run parameters > > integrator = > sd ; velocity Langevin dynamics > > > dt = 0.002 > > > nsteps = > 2500000 ; 5000 ps (5 ns) > > > nstcomm = 1 > > comm_mode = > angular ; non-periodic system > > ; Output parameters > > > nstxout = > 0 ; nst[xvf]out = 0 to suppress > > useless .trr output > > > nstvout = 0 > > > nstfout = 0 > > nstlog = > 5000 ; 10 ps > > nstenergy = > 5000 ; 10 ps > > nstxtcout = > 5000 ; 10 ps > > ; Bond parameters > > constraint_algorithm = lincs > > > constraints = all-bonds > > continuation = > no ; starting up > > ; required cutoffs for implicit > > > nstlist = 10 > > > ns_type = grid > > > rlist = 2.0 > > > rcoulomb = 2.0 > > > rvdw = 2.0 > > ; cutoffs required for qq and vdw > > coulombtype = cut-off > > vdwtype = cut-off > > ; temperature coupling > > > tc_grps = System > > > tau_t = 1.0 ; inverse friction coefficient for Langevin > > (ps^-1) > > > ref_t = 310 > > ; Pressure coupling is off > > > Pcoupl = no > > ; Generate velocities is on > > > gen_vel = > yes gen_temp = 310 > > > gen_seed = 173529 > > ; Periodic boundary conditions are off for > implicit> > pbc = no > > ; Free energy must be off to use all-vs-all > kernels> ; default, but just for the sake of > being pedantic > > free_energy = no > > ; Settings for implicit solvent > > implicit_solvent = GBSA > > > gb_algorithm = OBC > > > rgbradii = 2.0 > > > > > > === Installation commands for the cluster === > > > > $ ./configure -- > prefix=/home/rdiv1001/gromacs-4.5 > > CPPFLAGS="-I/home/rdiv1001/fftw-3.0.1-osx/include" > > LDFLAGS="-L/home/rdiv1001/fftw-3.0.1- > osx/lib" --disable-threads > > --without-x --program-suffix=_4.5.1_s > > > > $ make > > > > $ make install > > > > $ make distclean > > > > $ ./configure -- > prefix=/home/rdiv1001/gromacs-4.5 > > CPPFLAGS="-I/home/rdiv1001/fftw-3.0.1-osx/include" > > LDFLAGS="-L/home/rdiv1001/fftw-3.0.1- > osx/lib" --disable-threads > > --without-x --program-suffix=_4.5.1_mpi -- > enable-mpi > > CXXCPP="/nfs/compilers/mpich- > 1.2.5/bin/mpicxx -E" > > > > $ make mdrun > > > > $ make install-mdrun > > > > > > -- > ========================================> > > Justin A. Lemkul > > Ph.D. Candidate > > ICTAS Doctoral Scholar > > MILES-IGERT Trainee > > Department of Biochemistry > > Virginia Tech > > Blacksburg, VA > > jalemkul[at]vt.edu <http://vt.edu> | > (540) 231-9080 > > > http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin> > > ======================================== > > -- gmx-users mailing > list [email protected] > > <mailto:[email protected]> > > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > Please don't post (un)subscribe requests to > the list. Use the www > > interface or send it to gmx-users- > [email protected]> <mailto:gmx-users- > [email protected]>.> Can't post? Read > http://www.gromacs.org/Support/Mailing_Lists> > > > > > > > >-- > >ORNL/UT Center for Molecular Biophysics cmb.ornl.gov > <http://cmb.ornl.gov>>865-241-1537, ORNL PO BOX 2008 MS6309 > > -- > ======================================== > > Justin A. Lemkul > Ph.D. Candidate > ICTAS Doctoral Scholar > MILES-IGERT Trainee > Department of Biochemistry > Virginia Tech > Blacksburg, VA > jalemkul[at]vt.edu | (540) 231-9080 > http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin > > ======================================== > -- > gmx-users mailing list [email protected] > http://lists.gromacs.org/mailman/listinfo/gmx-users > Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > Please don't post (un)subscribe requests to the list. Use the > www interface or send it to [email protected]. > Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
-- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

