Hello listers,

this is my first try with Gromacs (3.3.1). I've installed LAM-MPI (7.1.2) and FFTW3 in my two Dual Intel P4 CPU machines (4 physical CPUs, 8 with hyperthreading on, i've already read in the mailing list archive that i should turn off hyperthreading until Gromacs 4 release to improve performance) running Fedora Core 4 (kernel 2.6).

Just to test the parallel processing, i downloaded and tried to run one of the benchmark tests (d.lzm).

I prepared it with:

grompp -f cutoff.mdp -c conf.gro -p topol.top -np 2

(here i had to read the archives to avoid temptation to include -nt 2, which even including --enable-threads in configure options gave me an error).


But when tried to run it in my two-nodes as a parallel task with:

mpirun n0,1 mdrun -s topol.tpr -np 2

i got the following output from mdrun:

NNODES=2, MYRANK=1, HOSTNAME=lead8
NNODES=2, MYRANK=0, HOSTNAME=lead7
NODEID=1 argc=5
NODEID=0 argc=5
CUT SOME MDRUN HELP INFO >>>
-------------------------------------------------------
Program mdrun, VERSION 3.3.1
Source code file: gmxfio.c, line: 706

Can not open file:
topol.tpr
-------------------------------------------------------

"I'm a Jerk" (F. Black)

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 2

gcq#171: "I'm a Jerk" (F. Black)

-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 27859 failed on node n1 (192.168.1.9) with exit status 1.
-----------------------------------------------------------------------------

You can see from lamnodes that node n1 is the originating node

lamnodes
n0      lead7:2:
n1      192.168.1.9:2:origin,this_node

and from ps -leaf | grep mdrun i can see that both processes have been started, but neither uses CPU at all. So far, i guess this is because if the originating node (n1) can't read topol.tpr file, it can't distribute tasks amongst nodes (which would be causing the unknown error in node 0, the other node).

Any ideas on what's happening? How do i solve it?

Thank you very much !

Guillem Plasencia
Spain.

P.D. I've read on the archives that there was some interest in knowing if hyperthreading is still doing wrong balancing in linux kernel 2.6, which happens to be the kernel i'm running. I'd be pleased to test both HT on and off on my nodes, of course as soon as i solve this problem with topol.tpr file.


_______________________________________________
gmx-users mailing list    [email protected]
http://www.gromacs.org/mailman/listinfo/gmx-users
Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to