Nicolas wrote:
Hello,
I'm trying to do a benchmark with Gromacs 4 on our cluster, but I don't
completely understand the results I obtain. The system I used is a 128
DOPC bilayer hydrated by ~18800 SPC for a total of ~70200 atoms. The
size of the system is 9.6x9.6x10.1 nm^3. I'm using the following
parameters:
* nstlist = 10
* rlist = 1
* Coulombtype = PME
* rcoulomb = 1
* fourier spacing = 0.12
* vdwtype = Cutoff
* rvdw = 1
The cluster itself has got 2 procs/node connected by Ethernet 100 MB/s.
Ethernet and Gigabit ethernet are not fast enough to get reasonable
scaling. There've been quite a few posts on this topic in the last six
months.
Hmm I see you've corrected your post to refer to Infiniband with four
cores/node. That should be reasonable, I understand (but search the
archive).
You should also check that your benchmark calculation is long enough
that you are measuring a simulation time that isn't being dominated by
setup costs. Some years ago a non-MD sysadmin complained of poor scaling
when he was testing over 10 or so MD steps!
I'm using mpiexec to run Gromacs. When I use -npme 2 -ddorder
interleave, I get:
ncore Perf (ns/day) PME (%)
1 0,00 0
2 0,00 0
3 0,00 0
4 1,35 28
5 1,84 31
6 2,08 27
8 2,09 21
10 2,25 17
12 2,02 15
14 2,20 13
16 2,04 11
18 2,18 10
20 2,29 9
So, above 6-8 cores, the PP nodes are spending too much time waiting for
the PME nodes and the perf forms a plateau.
That's not surprising - the heuristic is that about a third to a quarter
of the cores want to be PME-only nodes. Of course, that depends on the
relative size of the real- and reciprocal-space parts of the calculation.
When I use -npme 0, I get:
ncore Perf (ns/day) PME (%)
1 0,43 33
2 0,92 34
3 1,34 35
4 1,69 36
5 2,17 33
6 2,56 32
8 3,24 33
10 3,84 34
12 4,34 35
14 5,05 32
16 5,47 34
18 5,54 37
20 6,13 36
I obtain much better performances when there is no PME nodes, while I
was expecting the opposite. Does someone have an explanation for that?
Does that means domain decomposition is useless below a certain real
space cutoff? I'm quite confused.
The relevant observations are for 4,5,6 and 8, for which shared-duty is
out-performing -npme 2. I think your observations support the conclusion
that your network hardware is more limiting for PME-only nodes than
shared-duty nodes. They don't support the conclusion that DD is useless,
since you haven't compared with PD.
You can play with the PME parameters to shift more load into the
real-space part - IIRC Carsten suggested a heuristic a few months back.
Mark
_______________________________________________
gmx-users mailing list [email protected]
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [email protected].
Can't post? Read http://www.gromacs.org/mailing_lists/users.php