On Tuesday 16 February 2010, Jeff Squyres wrote:
> We've only got 2 "critical" 1.5.0 bugs left, and I think that those will
> both be closed out pretty soon.
>
>     https://svn.open-mpi.org/trac/ompi/report/15
>
> Rainer and I both feel that a RC for 1.5.0 could be pretty soon.
>
> Does anyone have any heartburn with this?  Does anyone have any things they
> still need to get in v1.5.0?

I noticed that 1.5a1r22627 still has a very suboptimal default selection of 
(at least) alltoall algorithms. This has been mentioned several times since 
the first major discussion[1] but nothing seems to have improved.

A short re-cap of the situation is that by default ompi switches from bruck to 
basic-linear at ~100 bytes pkg size and this is bad<tm>. The first set of 
figures below are with vanilla ompi and the second set is with a dynamic 
rules file [2] that foreces bruck for all pkg sizes. For details on the 
system see [3].

The problem is equally visible on tcp as on openib. A concrete result is that 
OpenMPI on IB is way slower than other MPIs on 1G eth (for the affected pkg 
sizes (100-3000 bytes)).

[cap@n115 mpi]$ mpirun --host $(hostlist --expand -s',' 
$SLURM_JOB_NODELIST) --bind-to-core  ./alltoall.ompi15a1r22627 
profile.ompibadness
running in profile-from-file mode
bw for   400     x 1 B :   2.0 Mbytes/s          time was:  24.9 ms
bw for   400     x 25 B :  52.8 Mbytes/s         time was:  23.9 ms
bw for   400     x 50 B :  82.2 Mbytes/s         time was:  30.7 ms
bw for   400     x 75 B :  90.4 Mbytes/s         time was:  41.8 ms
bw for   400     x 100 B : 109.2 Mbytes/s        time was:  46.1 ms
bw for   400     x 200 B :   4.8 Mbytes/s        time was:   2.1 s
bw for   400     x 300 B :   7.0 Mbytes/s        time was:   2.2 s
bw for   400     x 400 B :   9.8 Mbytes/s        time was:   2.1 s
bw for   400     x 500 B :  12.3 Mbytes/s        time was:   2.0 s
bw for   400     x 750 B :  18.5 Mbytes/s        time was:   2.0 s
bw for   400     x 1000 B :  24.6 Mbytes/s       time was:   2.0 s
bw for   400     x 1250 B :  29.9 Mbytes/s       time was:   2.1 s
bw for   400     x 1500 B :  35.1 Mbytes/s       time was:   2.2 s
bw for   400     x 2000 B :  45.5 Mbytes/s       time was:   2.2 s
bw for   400     x 2500 B :  51.0 Mbytes/s       time was:   2.5 s
bw for   400     x 3000 B : 113.6 Mbytes/s       time was:   1.3 s
bw for   400     x 3500 B : 123.3 Mbytes/s       time was:   1.4 s
bw for   400     x 4000 B : 135.7 Mbytes/s       time was:   1.5 s
totaltime was:  25.8 s
[cap@n115 mpi]$ mpirun --host $(hostlist --expand -s',' 
$SLURM_JOB_NODELIST) --bind-to-core -mca coll_tuned_use_dynamic_rules 1 -mca 
coll_tuned_dynamic_rules_filename ./dyn_rules ./alltoall.ompi15a1r22627 
profile.ompibadness
running in profile-from-file mode
bw for   400     x 1 B :   2.1 Mbytes/s          time was:  24.3 ms
bw for   400     x 25 B :  55.1 Mbytes/s         time was:  22.9 ms
bw for   400     x 50 B :  82.6 Mbytes/s         time was:  30.5 ms
bw for   400     x 75 B :  89.4 Mbytes/s         time was:  42.3 ms
bw for   400     x 100 B : 109.9 Mbytes/s        time was:  45.9 ms
bw for   400     x 200 B : 115.1 Mbytes/s        time was:  87.6 ms
bw for   400     x 300 B : 117.8 Mbytes/s        time was: 128.3 ms
bw for   400     x 400 B : 105.4 Mbytes/s        time was: 191.2 ms
bw for   400     x 500 B : 113.4 Mbytes/s        time was: 222.1 ms
bw for   400     x 750 B : 119.3 Mbytes/s        time was: 316.9 ms
bw for   400     x 1000 B : 120.9 Mbytes/s       time was: 416.9 ms
bw for   400     x 1250 B : 121.0 Mbytes/s       time was: 520.6 ms
bw for   400     x 1500 B : 120.3 Mbytes/s       time was: 628.2 ms
bw for   400     x 2000 B : 118.0 Mbytes/s       time was: 854.1 ms
bw for   400     x 2500 B :  96.5 Mbytes/s       time was:   1.3 s
bw for   400     x 3000 B : 107.4 Mbytes/s       time was:   1.4 s
bw for   400     x 3500 B : 109.1 Mbytes/s       time was:   1.6 s
bw for   400     x 4000 B : 109.2 Mbytes/s       time was:   1.8 s
totaltime was:   9.7 s

[1] [OMPI users] scaling problem with openmpi
From: Roman Martonak <r.marto...@gmail.com>
  To: us...@open-mpi.org
  Date: 2009-05-16 00.20

[2]:
 1 # num of collectives
 3 # ID = 3 Alltoall collective (ID in coll_tuned.h)
 1 # number of com sizes
 32 # comm size 8
 1 # number of msg sizes
 0 3 0 0 # for message size 0, bruck 1, topo 0, 0 segmentation
 # end of first collective

[3]:
 OpenMPI: Built with intel-11.1.074 only configure options used were:
  --enable-orterun-prefix-by-default
  --prefix
 OS: CentOS-5.4 x86_64
 HW: Dual E5520 nodes with IB (ConnectX)
 Size of job: 8 nodes (that is 64 cores/ranks)

/Peter

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to