Roland Schulz wrote:
Justin,

I think the interaction kernel is not OK on your PowerPC machine. I assume that from: 1) The force seems to be zero (minimization output). 2) When you use the all-to-all kernel which is not available for the powerpc kernel, it automatically falls back to the C kernel and then it works.


Sounds about right.

What is the kernel you are using? It should say in the log file. Look for: "Configuring single precision IBM Power6-specific Fortran kernels" or "Testing Altivec/VMX support"


I'm not finding either in the config.log - weird?

You can also look in the config.h whether GMX_POWER6 and/or GMX_PPC_ALTIVEC is set. I suggest you try to compile with one/both of them deactivated and see whether that solves it. This will make it slower too. Thus if this is indeed the problem, you will probably want to figure out why the fastest kernel doesn't work correctly to get good performance.


It looks like GMX_PPC_ALTIVEC is set. I suppose I could re-compile with this turned off.

Here's what's even weirder. The problematic version was compiled using the standard autoconf procedure. If I use a CMake-compiled version, the energy minimization runs fine, giving the same results (energy and force) as the two systems I know are good. So I guess there's something wrong with the way autoconf installed Gromacs. Perhaps this isn't of concern since Gromacs will require CMake in subsequent releases, but I figure I should at least report it in case it affects anyone else.

If I may tack one more question on here, I'm wondering why my CMake installation doesn't actually appear to be using MPI. I get the right result, but the problem is, I get a .log, .edr, and .trr for every processor that's being used, as if each processor is being given its own job and not distributing the work. Here's how I compiled my MPI mdrun, version 4.5.1:

cmake ../gromacs-4.5.1 -DFFTW3F_LIBRARIES=/home/rdiv1001/fftw-3.0.1-osx/lib/libfftw3f.a -DFFTW3F_INCLUDE_DIR=/home/rdiv1001/fftw-3.0.1-osx/include/ -DCMAKE_INSTALL_PREFIX=/home/rdiv1001/gromacs-4.5_cmake-osx -DGMX_BINARY_SUFFIX=_4.5_cmake_mpi -DGMX_THREADS=OFF -DBUILD_SHARED_LIBS=OFF -DGMX_X11=OFF -DGMX_MPI=ON -DMPI_COMPILER=/home/rdiv1001/compilers/openmpi-1.2.3-osx/bin/mpicxx -DMPI_INCLUDE_PATH=/home/rdiv1001/compilers/openmpi-1.2.3-osx/include

$ make mdrun

$ make install-mdrun

Is there anything obviously wrong with those commands? Is there any way I should know (before actually using mdrun) whether or not I've done things right?

-Justin

Roland


On Mon, Sep 27, 2010 at 4:59 PM, Justin A. Lemkul <[email protected] <mailto:[email protected]>> wrote:


    Hi All,

    I'm hoping I might get some tips in tracking down the source of an
    issue that appears to be hardware-specific, leading to crashes in my
    system.  The failures are occurring on our supercomputer (Mac OSX
    10.3, PowerPC).  Running the same .tpr file on my laptop (Mac OSX
    10.5.8, Intel Core2Duo) and on another workstation (Ubuntu 10.04,
    AMD64) produce identical results.  I suspect the problem stems from
    unsuccessful energy minimization, which then leads to a crash when
    running full MD.  All jobs were run in parallel on two cores.  The
    supercomputer does not support threading, so MPI is invoked using
    MPICH-1.2.5 (native MPI implementation on the cluster).


    Details as follows:

    EM md.log file: successful run (Intel Core2Duo or AMD64)

    Steepest Descents converged to Fmax < 1000 in 7 steps
    Potential Energy  = -4.8878180e+04
    Maximum force     =  8.7791553e+02 on atom 5440
    Norm of force     =  1.1781271e+02


    EM md.log file: unsuccessful run (PowerPC)

    Steepest Descents converged to Fmax < 1000 in 1 steps
    Potential Energy  = -2.4873273e+04
    Maximum force     =  0.0000000e+00 on atom 0
    Norm of force     =            nan


    MD invoked from the minimized structure generated on my laptop or
    AMD64 runs successfully (at least for a few hundred steps in my
    test), but the MD on the PowerPC cluster fails immediately:

              Step           Time         Lambda
                 0        0.00000        0.00000

      Energies (kJ/mol)
               U-B    Proper Dih.  Improper Dih.      CMAP Dih.GB
    Polarization
7.93559e+03 9.34958e+03 2.24036e+02 -2.47750e+03 -7.83599e+04 LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Potential 7.70042e+03 9.94520e+04 -1.17168e+04 -5.79783e+04 -2.55780e+04 Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd nan nan nan 0.00000e+00 nan
     Constr.2 rmsd
               nan

    DD  step 9 load imb.: force  3.0%


    -------------------------------------------------------
    Program mdrun_4.5.1_mpi, VERSION 4.5.1
    Source code file: nsgrid.c, line: 601

    Range checking error:
    Explanation: During neighborsearching, we assign each particle to a grid
    based on its coordinates. If your system contains collisions or
    parameter
    errors that give particles very high velocities you might end up
    with some
    coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
    put these on a grid, so this is usually where we detect those errors.
    Make sure your system is properly energy-minimized and that the
    potential
    energy seems reasonable before trying again.
    Variable ind has value 7131. It should have been within [ 0 .. 7131 ]

    For more information and tips for troubleshooting, please check the
    GROMACS
    website at http://www.gromacs.org/Documentation/Errors
    -------------------------------------------------------

    It seems as if the crash really shouldn't be happening, if the value
    range is inclusive.

    Running with all-vs-all kernels works, but the performance is
    horrendously slow (<300 ps per day for a 7131-atom system) so I am
    attempting to use long cutoffs (2.0 nm) as others on the list have
    suggested.

    Details of the installations and .mdp files are appended below.

    -Justin

    === em.mdp ===
    ; Run parameters
    integrator      = steep         ; EM
    emstep      = 0.005
    emtol       = 1000
    nsteps      = 50000
    nstcomm         = 1
    comm_mode   = angular       ; non-periodic system
    ; Bond parameters
    constraint_algorithm    = lincs
    constraints             = all-bonds
    continuation    = no            ; starting up
    ; required cutoffs for implicit
    nstlist         = 1
    ns_type         = grid
    rlist           = 2.0
    rcoulomb        = 2.0
    rvdw            = 2.0
    ; cutoffs required for qq and vdw
    coulombtype     = cut-off
    vdwtype     = cut-off
    ; temperature coupling
    tcoupl          = no
    ; Pressure coupling is off
    Pcoupl          = no
    ; Periodic boundary conditions are off for implicit
    pbc                 = no
    ; Settings for implicit solvent
    implicit_solvent    = GBSA
    gb_algorithm        = OBC
    rgbradii            = 2.0


    === md.mdp ===

    ; Run parameters
    integrator      = sd            ; velocity Langevin dynamics
    dt                  = 0.002
    nsteps          = 2500000               ; 5000 ps (5 ns)
    nstcomm         = 1
    comm_mode   = angular       ; non-periodic system
    ; Output parameters
    nstxout         = 0             ; nst[xvf]out = 0 to suppress
    useless .trr output
    nstvout         = 0
    nstfout         = 0
    nstlog      = 5000          ; 10 ps
    nstenergy   = 5000          ; 10 ps
    nstxtcout   = 5000          ; 10 ps
    ; Bond parameters
    constraint_algorithm    = lincs
    constraints             = all-bonds
    continuation    = no            ; starting up
    ; required cutoffs for implicit
    nstlist         = 10
    ns_type         = grid
    rlist           = 2.0
    rcoulomb        = 2.0
    rvdw            = 2.0
    ; cutoffs required for qq and vdw
    coulombtype     = cut-off
    vdwtype     = cut-off
    ; temperature coupling
    tc_grps         = System
    tau_t           = 1.0   ; inverse friction coefficient for Langevin
    (ps^-1)
    ref_t           = 310
    ; Pressure coupling is off
    Pcoupl          = no
    ; Generate velocities is on
gen_vel = yes gen_temp = 310
    gen_seed        = 173529
    ; Periodic boundary conditions are off for implicit
    pbc                 = no
    ; Free energy must be off to use all-vs-all kernels
    ; default, but just for the sake of being pedantic
    free_energy = no
    ; Settings for implicit solvent
    implicit_solvent    = GBSA
    gb_algorithm        = OBC
    rgbradii            = 2.0


    === Installation commands for the cluster ===

    $ ./configure --prefix=/home/rdiv1001/gromacs-4.5
    CPPFLAGS="-I/home/rdiv1001/fftw-3.0.1-osx/include"
    LDFLAGS="-L/home/rdiv1001/fftw-3.0.1-osx/lib" --disable-threads
    --without-x --program-suffix=_4.5.1_s

    $ make

    $ make install

    $ make distclean

    $ ./configure --prefix=/home/rdiv1001/gromacs-4.5
    CPPFLAGS="-I/home/rdiv1001/fftw-3.0.1-osx/include"
    LDFLAGS="-L/home/rdiv1001/fftw-3.0.1-osx/lib" --disable-threads
    --without-x --program-suffix=_4.5.1_mpi --enable-mpi
    CXXCPP="/nfs/compilers/mpich-1.2.5/bin/mpicxx -E"

    $ make mdrun

    $ make install-mdrun


-- ========================================

    Justin A. Lemkul
    Ph.D. Candidate
    ICTAS Doctoral Scholar
    MILES-IGERT Trainee
    Department of Biochemistry
    Virginia Tech
    Blacksburg, VA
    jalemkul[at]vt.edu <http://vt.edu> | (540) 231-9080
    http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

    ========================================
-- gmx-users mailing list [email protected]
    <mailto:[email protected]>
    http://lists.gromacs.org/mailman/listinfo/gmx-users
    Please search the archive at
    http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
    Please don't post (un)subscribe requests to the list. Use the www
    interface or send it to [email protected]
    <mailto:[email protected]>.
    Can't post? Read http://www.gromacs.org/Support/Mailing_Lists




--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov <http://cmb.ornl.gov>
865-241-1537, ORNL PO BOX 2008 MS6309

--
========================================

Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================
--
gmx-users mailing list    [email protected]
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected].
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to