On Friday 10 July 2009, Jeff Squyres wrote:
> http://www.open-mpi.org/software/ompi/v1.3/
>
> Please test!

Built and ran just like(*) 1.3.2 on my limited tests (that is, worked quite 
well)

OS:CentOS-5.3.x86_64 with its own OFED
HW:ConnectX-DDR on a Nehalem dual-quad platform
Size:4 nodes
Compilers: Intel-11.0-074 (built with C/C++/F90, tested C and F90)

(*) It seems to still have the problem reported in:

 [OMPI users] scaling problem with openmpi
 From: Roman Martonak <r.marto...@gmail.com>
 To: us...@open-mpi.org
 Date: 2009-05-16 00.20

That is, it selects basic-linear for alltoall when it should have picked bruck 
and the result is suckish performance:

as-shipped:

 $ mpirun -np 32 -host tbox13,tbox14,tbox15,tbox16 ./alltoall.openmpi133rc1 \ 
profile.short-small
 running in profile-from-file mode
 bw for   10000   x 0 B :   0.0 bytes/s   time was: 142.1 us
 bw for   10000   x 1 B :   2.8 Mbytes/s          time was: 224.0 ms
 bw for   10000   x 2 B :   5.5 Mbytes/s          time was: 225.5 ms
 bw for   10000   x 4 B :  11.0 Mbytes/s          time was: 225.6 ms
 bw for   10000   x 8 B :  23.6 Mbytes/s          time was: 210.2 ms
 bw for   10000   x 16 B :  44.1 Mbytes/s         time was: 224.9 ms
 bw for   10000   x 32 B :  79.2 Mbytes/s         time was: 250.7 ms
 bw for   10000   x 64 B : 132.0 Mbytes/s         time was: 300.6 ms
 bw for   10000   x 128 B : 195.7 Mbytes/s        time was: 405.4 ms
 bw for   10000   x 256 B :  11.4 Mbytes/s        time was:  14.0 s
 bw for   10000   x 512 B :  24.1 Mbytes/s        time was:  13.2 s
 bw for   10000   x 1024 B :  53.6 Mbytes/s       time was:  11.9 s
 totaltime was:  41.0 s

forcing bruck:

 $ mpirun -np 32 -mca coll_tuned_alltoall_algorithm 3 -mca \ 
coll_tuned_use_dynamic_rules 1 -host \ 
tbox13,tbox14,tbox15,tbox16 ./alltoall.openmpi133rc1 profile.short-small
 running in profile-from-file mode
 bw for   10000   x 0 B :   0.0 bytes/s   time was: 142.1 us
 bw for   10000   x 1 B :   3.5 Mbytes/s          time was: 176.8 ms
 bw for   10000   x 2 B :   6.9 Mbytes/s          time was: 179.4 ms
 bw for   10000   x 4 B :  13.4 Mbytes/s          time was: 184.5 ms
 bw for   10000   x 8 B :  24.3 Mbytes/s          time was: 203.8 ms
 bw for   10000   x 16 B :  45.3 Mbytes/s         time was: 219.0 ms
 bw for   10000   x 32 B :  81.0 Mbytes/s         time was: 245.1 ms
 bw for   10000   x 64 B : 134.1 Mbytes/s         time was: 295.9 ms
 bw for   10000   x 128 B : 198.3 Mbytes/s        time was: 400.2 ms
 bw for   10000   x 256 B : 233.8 Mbytes/s        time was: 679.0 ms
 bw for   10000   x 512 B : 281.5 Mbytes/s        time was:   1.1 s
 bw for   10000   x 1024 B : 292.1 Mbytes/s       time was:   2.2 s
 totaltime was:   5.9 s

I didn't follow up on this thinking it had been solved...

/Peter

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to