Pavel,
Mvapich implements multicore optimized collectives, which perform substantially
better than default algorithms.
FYI, ORNL team works on new high performance collectives framework for OMPI.
The framework provides significant boost in collectives performance.
Regards,
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Mar 23, 2012, at 9:17 AM, Pavel Mezentsev wrote:
I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's why I
didn't use --bind-to-core. I checked and the usage of --bind-to-core improved
the result comparing to 1.5.4:
#repetitions t_min[usec] t_max[usec] t_avg[usec]
1000 84.96 85.08 85.02
So I guess with 1.5.5 the processes move from core to core within node even
though I use all cores, right? Then why 1.5.4 behaves differently?
I need --bind-to-core in some cases and that's why I need 1.5.5rc3 instead of
more stable 1.5.4. I know that I can use numactl explicitly but --bind-to-core
is more convinient :)
2012/3/23 Ralph Castain <[email protected]<mailto:[email protected]>>
I don't see where you told OMPI to --bind-to-core. We don't automatically bind,
so you have to explicitly tell us to do so.
On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote:
> Hello
>
> I'm doing some testing with IMB and dicovered a strange thing:
>
> Since I have a system with new AMD opteron 6276 processors I'm using 1.5.5rc3
> since it supports binding to cores.
>
> But when I run the barrier test form intel mpi benchmarks, the best I get is:
> #repetitions t_min[usec] t_max[usec] t_avg[usec]
> 598 15159.56 15211.05 15184.70
> (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile
> hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256
> openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
> barrier)
>
> And with openmpi 1.5.4 the result is much better:
> #repetitions t_min[usec] t_max[usec] t_avg[usec]
> 1000 113.23 113.33 113.28
>
> (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile
> hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256
> openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
> barrier)
>
> and still I couldn't come close to the result I got with mvapich:
> #repetitions t_min[usec] t_max[usec] t_avg[usec]
> 1000 17.51 17.53 17.53
>
> (/opt/mvapich2-1.8/intel12/bin/mpiexec.hydra -env OMP_NUM_THREADS 1 -hostfile
> hosts_all2all_2 -np 256 mvapich2-1.8/intel12/IMB-MPI1 -mem 2 -off_cache 16,64
> -msglog 1:16 -npmin 256 barrier)
>
> I dunno if this is a bug or me doing something not the way I should. So is
> there a way to improve my results?
>
> Best regards,
> Pavel Mezentsev
>
>
> _______________________________________________
> devel mailing list
> [email protected]<mailto:[email protected]>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]<mailto:[email protected]>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]<mailto:[email protected]>
http://www.open-mpi.org/mailman/listinfo.cgi/devel