On Friday 10 July 2009, Jeff Squyres wrote: > http://www.open-mpi.org/software/ompi/v1.3/ > > Please test!
Built and ran just like(*) 1.3.2 on my limited tests (that is, worked quite well) OS:CentOS-5.3.x86_64 with its own OFED HW:ConnectX-DDR on a Nehalem dual-quad platform Size:4 nodes Compilers: Intel-11.0-074 (built with C/C++/F90, tested C and F90) (*) It seems to still have the problem reported in: [OMPI users] scaling problem with openmpi From: Roman Martonak <r.marto...@gmail.com> To: us...@open-mpi.org Date: 2009-05-16 00.20 That is, it selects basic-linear for alltoall when it should have picked bruck and the result is suckish performance: as-shipped: $ mpirun -np 32 -host tbox13,tbox14,tbox15,tbox16 ./alltoall.openmpi133rc1 \ profile.short-small running in profile-from-file mode bw for 10000 x 0 B : 0.0 bytes/s time was: 142.1 us bw for 10000 x 1 B : 2.8 Mbytes/s time was: 224.0 ms bw for 10000 x 2 B : 5.5 Mbytes/s time was: 225.5 ms bw for 10000 x 4 B : 11.0 Mbytes/s time was: 225.6 ms bw for 10000 x 8 B : 23.6 Mbytes/s time was: 210.2 ms bw for 10000 x 16 B : 44.1 Mbytes/s time was: 224.9 ms bw for 10000 x 32 B : 79.2 Mbytes/s time was: 250.7 ms bw for 10000 x 64 B : 132.0 Mbytes/s time was: 300.6 ms bw for 10000 x 128 B : 195.7 Mbytes/s time was: 405.4 ms bw for 10000 x 256 B : 11.4 Mbytes/s time was: 14.0 s bw for 10000 x 512 B : 24.1 Mbytes/s time was: 13.2 s bw for 10000 x 1024 B : 53.6 Mbytes/s time was: 11.9 s totaltime was: 41.0 s forcing bruck: $ mpirun -np 32 -mca coll_tuned_alltoall_algorithm 3 -mca \ coll_tuned_use_dynamic_rules 1 -host \ tbox13,tbox14,tbox15,tbox16 ./alltoall.openmpi133rc1 profile.short-small running in profile-from-file mode bw for 10000 x 0 B : 0.0 bytes/s time was: 142.1 us bw for 10000 x 1 B : 3.5 Mbytes/s time was: 176.8 ms bw for 10000 x 2 B : 6.9 Mbytes/s time was: 179.4 ms bw for 10000 x 4 B : 13.4 Mbytes/s time was: 184.5 ms bw for 10000 x 8 B : 24.3 Mbytes/s time was: 203.8 ms bw for 10000 x 16 B : 45.3 Mbytes/s time was: 219.0 ms bw for 10000 x 32 B : 81.0 Mbytes/s time was: 245.1 ms bw for 10000 x 64 B : 134.1 Mbytes/s time was: 295.9 ms bw for 10000 x 128 B : 198.3 Mbytes/s time was: 400.2 ms bw for 10000 x 256 B : 233.8 Mbytes/s time was: 679.0 ms bw for 10000 x 512 B : 281.5 Mbytes/s time was: 1.1 s bw for 10000 x 1024 B : 292.1 Mbytes/s time was: 2.2 s totaltime was: 5.9 s I didn't follow up on this thinking it had been solved... /Peter
signature.asc
Description: This is a digitally signed message part.