[gmx-users] Preliminary report on benchmark on Opteron Cluster Infiniband on current CVS version

Luca Ferraro Tue, 17 Apr 2007 09:56:00 -0700

Hello GROMACS world,

I would like to report a little benchmark activity within some
preliminary results I recently performed on two different cluster

(Opteron-SC and Opteron-DC on InfiniBand) using the current CVS versionof GROMACS.

From these preliminary tests, it seems that the speedup and scaling arequite poor on the DPPC test. As reported the best speedup is obtainedusing the PGI6.2+ACML3.6 combination, giving a speedup of 10.9 over 16CPUs with 68% of scaling. The same DPPC has been performed using theGROMACS v3.3.1 giving better results, showing almost a linear speedup upover 16 CPUs with 80% of scaling [results not reported here, butavailable if you want].

It seems I will not ever reach the incredible scaling declared on thisnew domain-decomposition release as reported in many posts ... What'swrong with my benchmarks? Am I missing something?


Thank you in advance for your kind attention.

luca

#####################################################################

METHOD:

I have checked out the CVS version and build it into many targets, eachfor all the supported compiler installed on our cluster, using theMVAPICH-0.9.8 driver (as reported below in the ### CONFIG DETAILS ###section).


I have run all the benchmarks taken from the gmxbench-3.0.tar.gz package
(to fix: the "title" keyword in grompp.mpd is always set to dppc in all
benches). I have used different directories for the lzm case (separating
the PME case from the simple-cutoff one).

Anyway, since we are looking for speedup and scaling up to 32 CPUs, Ifocused on the DPPC test case (about 130000 atoms), since the number ofatoms per process would decrease too much and fit into the cache,altering the benchmark.

The mdrun program from the CVS use a domain-decomposition scheme: so Ichoose to split the domain along the Y-axis. However, further benchmarkshave been performed also using the full decomposition on all axes (forexample: -d 2 2 2 on 8 CPUs, -d 4 4 2 on 32 CPUS), without anysignificant improvement.

Benchmarks have been run using 1, 2, 4, 8, 16 and 32 processes, usingthe following running command (taken from my script):


[SNIP]

for dir in d.dppc d.lzm-cutoff d.lzm-pme d.poly-ch2 d.villin; do
  for proc in 1 2 4 8 16 32; do
    # setting up benchmark directory
[SNIP]

    # running benchmark
    grompp
    /usr/bin/time mpiexec -n ${proc} mdrun_mpi -d 1 ${proc} 1

    # collect results
[SNIP]

  done
done



RESULTS:

I report some results from the DPPC case, where:
- proc is the number of MPI processes (not processors!)
- (DC) means on a dual-core CPU using the DCORE Cluster,
  otherwise single-core on the INODE Cluster.
- no multi-threading has been used.

# Real time in seconds for the run (taken from md0.log)
#proc   gnu4.1    pgi6.2   intel9.1   gnu(DC)    pgi(DC)
 1    3084.450  3929.020  3632.470   3053.370   3304.330
 2    1810.000  1977.000  1771.000   1805.000   2080.000
 4    1093.000  1182.000  1077.000   1101.000   1206.000
 8     610.000   650.000   604.000    599.000    653.000
16     336.000   360.000   340.000    339.000    364.000
32     202.000   210.000   243.000    207.000    210.000

# speedup = p_1/(p_N), N is $proc
#proc   gnu4.1    pgi6.2   intel9.1   gnu(DC)  pgi(DC)
 1       1.00      1.00      1.00      1.00     1.00
 2       1.70      1.99      2.05      1.69     1.59
 4       2.82      3.32      3.37      2.77     2.74
 8       5.06      6.04      6.01      5.10     5.06
16       9.18     10.91     10.68      9.01     9.08
32      15.27     18.71     14.95     14.75    15.73

# scaling = p_1/(N*p_N), N is $proc
#proc   gnu4.1    pgi6.2   intel9.1   gnu(DC)   pgi(DC)
 1    100.00%   100.00%   100.00%    100.00%   100.00%
 2      85.21%   99.37%   102.55%     84.58%    79.43%
 4      70.55%   83.10%    84.32%     69.33%    68.50%
 8      63.21%   75.56%    75.18%     63.72%    63.25%
16      57.37%   68.21%    66.77%     56.29%    56.74%
32      47.72%   58.47%    46.71%     46.10%    49.17%



############## PLATFORM DETAILS ######################

INODE CLUSTER:
- 24 nodes - 2way Opteron (single-core rev 250) at 2.4GHz with 4 GB RAM
- InfiniBand - Silverstorm InfiniHost III SDR
- switch SilverStorm 9120 InfiniBand 4X DDR 20Gb/s

DCORE CLUSTER:
- 24 nodes - 2way Opteron (dual-core rev 280) at 2.4GHz with 8 GB RAM
- InfiniBand - Silverstorm InfiniHost III DDR
- switch SilverStorm 9120 InfiniBand 4X DDR 20Gb/s


############## CONFIG DETAILS ######################
BUILT TARGETS:
- intel-9.1, MKL 8.1, FFTW3
- gnu-4.1, ACML 3.6, FFTW3
- pgi-6.2, ACML 3.6, FFTW3

MPIMODULE:
MPI on driver mvapich-0.9.8

# Configure for general installation:
./configure --prefix=$PWD/gromax4_${TARGET} \
        --with-fft=fftw3 \
        --without-xml --disable-threads \
        --with-external-blas --with-external-lapack

# Configure for the MPI version of mdrun program:
./configure --prefix=$PWD/gromax4_${TARGET} \
        --enable-mpi --program-suffix=_${MPIMODULE} \
        --with-fft=fftw3 \
        --without-xml --disable-threads \
        --with-external-blas --with-external-lapack
##################################################

--
+------------------------------------------+
| Luca Ferraro
| Gruppo Scienze dei Materiali
| CASPUR (Consorzio per le Applicazioni di
| Supercalcolo Per Università e Ricerca)
| Via dei Tizii, 6b - 00185 ROMA
| Tel. +39-06-44486717
| Fax: +39-06-4957083
| cell: +39-339-7879898
| Email: [EMAIL PROTECTED]
| Web: http://www.caspur.it
+------------------------------------------+

_______________________________________________
gmx-users mailing list    [email protected]
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

[gmx-users] Preliminary report on benchmark on Opteron Cluster Infiniband on current CVS version

Reply via email to