Hi, I tried to use 4 and 8 CPUs.
There are about 6000 atoms in my system. The interconnect of our computer is the network with speed 1Gb but not optical fiber. I'm sorry for my poor English and I couldn't express well in my question. Everytime I submitted the parallel job, the nodes assigned to mehave been 100% loading, and the CPU source availble to me is less then 10%. I think there is something wrong with my submit script or executable script, and I post them in my previous message. How should I correct my script? Hsin-Lin > Hi, > > how many CPUs do you try to use? How big is your system. What kind of > interconnect? Since you use condor probably some pretty slow interconnect. > Than you can't aspect it to work on many CPUs. If you want to use many CPUs > for MD you need a faster interconnect. > > Roland > > 2010/4/2 Hsin-Lin Chiang <[email protected]> > > >  Hi, > > > > Do someone use gromacs, lam, and condor together here? > > I use gromacs with lam/mpi on condor system. > > Everytime I submit the parallel job. > > I got the node which is occupied before and the performance of each cpu is > > below 10%. > > How should I change the script? > > Below is one submit script and two executable script. > > > > condor_mpi: > > ---- > > #!/bin/bash > > Universe = parallel > > Executable = ./lamscript > > machine_count = 8 > > output = md_$(NODE).out > > error = md_$(NODE).err > > log = md.log > > arguments = /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md.sh > > +WantIOProxy = True > > should_transfer_files = yes > > when_to_transfer_output = on_exit > > Queue > > ------- > > > > lamscript: > > ------- > > #!/bin/sh > > > > _CONDOR_PROCNO=$_CONDOR_PROCNO > > _CONDOR_NPROCS=$_CONDOR_NPROCS > > _CONDOR_REMOTE_SPOOL_DIR=$_CONDOR_REMOTE_SPOOL_DIR > > > > SSHD_SH=`condor_config_val libexec` > > SSHD_SH=$SSHD_SH/sshd.sh > > > > CONDOR_SSH=`condor_config_val libexec` > > CONDOR_SSH=$CONDOR_SSH/condor_ssh > > > > # Set this to the bin directory of your lam installation > > # This also must be in your .cshrc file, so the remote side > > # can find it! > > export LAMDIR=/stathome/jiangsl/soft/lam-7.1.4 > > export PATH=${LAMDIR}/bin:${PATH} > > export LD_LIBRARY_PATH=/lib:/usr/lib:$LAMDIR/lib:.:/opt/intel/compilers/lib > > > > > > . $SSHD_SH $_CONDOR_PROCNO $_CONDOR_NPROCS > > > > # If not the head node, just sleep forever, to let the > > # sshds run > > if [ $_CONDOR_PROCNO -ne 0 ] > > then > >                 wait > >                 sshd_cleanup > >                 exit 0 > > fi > > > > EXECUTABLE=$1 > > shift > > > > # the binary is copied but the executable flag is cleared. > > # so the script have to take care of this > > chmod +x $EXECUTABLE > > > > # to allow multiple lam jobs running on a single machine, > > # we have to give somewhat unique value > > export LAM_MPI_SESSION_SUFFIX=$$ > > export LAMRSH=$CONDOR_SSH > > # when a job is killed by the user, this script will get sigterm > > # This script have to catch it and do the cleaning for the > > # lam environment > > finalize() > > { > > sshd_cleanup > > lamhalt > > exit > > } > > trap finalize TERM > > > > CONDOR_CONTACT_FILE=$_CONDOR_SCRATCH_DIR/contact > > export $CONDOR_CONTACT_FILE > > # The second field in the contact file is the machine name > > # that condor_ssh knows how to use. Note that this used to > > # say "sort -n +0 ...", but -n option is now deprecated. > > sort < $CONDOR_CONTACT_FILE | awk '{print $2}' > machines > > > > # start the lam environment > > # For older versions of lam you may need to remove the -ssi boot rsh line > > lamboot -ssi boot rsh -ssi rsh_agent "$LAMRSH -x" machines > > > > if [ $? -ne 0 ] > > then > >         echo "lamscript error booting lam" > >         exit 1 > > fi > > > > mpirun C -ssi rpi usysv -ssi coll_smp 1 $EXECUTABLE $@ & > > > > CHILD=$! > > TMP=130 > > while [ $TMP -gt 128 ] ; do > >         wait $CHILD > >         TMP=$?; > > done > > > > # clean up files > > sshd_cleanup > > /bin/rm -f machines > > > > # clean up lam > > lamhalt > > > > exit $TMP > > ---- > > > > md.sh > > ---- > > #!/bin/sh > > #running GROMACS > > /stathome/jiangsl/soft/gromacs-4.0.5/bin/mdrun_mpi_d \ > > -s /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.tpr \ > > -e /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.edr \ > > -o /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.trr \ > > -g /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.log \ > > -c /stathome/jiangsl/simulation/gromacs/2OMP/2OMP_1_1/md/200ns.gro > > ----- > > > > > > Hsin-Lin
-- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. Can't post? Read http://www.gromacs.org/mailing_lists/users.php

