Try adding "--map-by node" to your command line to ensure the procs really are running on separate nodes.
On Thu, Mar 27, 2014 at 1:40 AM, Wang,Yanfei(SYS) <wangyanfe...@baidu.com>wrote: > Hi, > > > > HW Test Topology: > > Ip:192.168.72.4/24 –192.168.72.4/24, enable vlan and RoCE > > IB03 server 40G port-- - 40G Ethernet switch ----IB04 server 40G port: > configure > it as RoCE link > > IP: 192.168.71.3/24 ---192.168.71.4/24 > > IB03 server 10G port – 10G Ethernet switch – IB04 server 10G port: configure > it as normal TCP/IP Ethernet link:(server management interface) > > > > Mpi configuration: > > *MPI Hosts file**:* > > [root@bb-nsi-ib04 pt2pt]# cat hosts > > ib03 slots=1 > > ib04 slots=1 > > *DNS hosts* > > [root@bb-nsi-ib04 pt2pt]# cat /etc/hosts > > 192.168.71.3 ib03 > > 192.168.71.4 ib04 > > [root@bb-nsi-ib04 pt2pt]# > > This configuration will create 2 nodes for MPI latency evaluation > > > > Benchmark: > > osu-micro-benchmarks-4.3 > > > > result: > > a. Enable traffic go between 10G TCP/IP port using following > /etc/hosts file > > > > root@bb-nsi-ib04 pt2pt]# cat /etc/hosts > > 192.168.71.3 ib03 > > 192.168.71.4 ib04 > > The average latency is 4.5us of osu_latency, see log following: > > [root@bb-nsi-ib04 pt2pt]# mpirun --hostfile hosts -np 2 osu_latency > > # OSU MPI Latency Test v4.3 > > # Size Latency (us) > > 0 4.56 > > 1 4.90 > > 2 4.90 > > 4 4.60 > > 8 4.71 > > 16 4.72 > > 32 5.40 > > 64 4.77 > > 128 6.74 > > 256 7.01 > > 512 7.14 > > 1024 7.63 > > 2048 8.22 > > 4096 10.39 > > 8192 14.26 > > 16384 20.80 > > 32768 31.97 > > 65536 37.75 > > 131072 47.28 > > 262144 80.40 > > 524288 137.65 > > 1048576 250.17 > > 2097152 484.71 > > 4194304 946.01 > > > > b. Enable traffic go between RoCE link using /etc/hosts as > following and mpirun –mca btl openib,self,sm … > > [root@bb-nsi-ib04 pt2pt]# cat /etc/hosts > > 192.168.72.3 ib03 > > 192.168.72.4 ib04 > > Result: > > [root@bb-nsi-ib04 pt2pt]# mpirun --hostfile hosts -np 2 --mca btl > openib,self,sm --mca btl_openib_cpc_include rdmacm osu_latency > > # OSU MPI Latency Test v4.3 > > # Size Latency (us) > > 0 4.83 > > 1 5.17 > > 2 5.12 > > 4 5.25 > > 8 5.38 > > 16 5.40 > > 32 5.19 > > 64 5.04 > > 128 6.74 > > 256 7.04 > > 512 7.34 > > 1024 7.91 > > 2048 8.17 > > 4096 10.39 > > 8192 14.22 > > 16384 22.05 > > 32768 31.68 > > 65536 37.57 > > 131072 48.25 > > 262144 79.98 > > 524288 137.66 > > 1048576 251.38 > > 2097152 485.66 > > 4194304 947.81 > > [root@bb-nsi-ib04 pt2pt]# > > > > *Question: * > > *1. **Why do they have similar latency, 5us, which is too small to > believe it! In our test environment, it will take more than 50 us to deal > with tcp sync and return sync_ack, and also x86 server will take more thans > 20us at average to do ip forwarding(test from professional HW tester), so > does the latency is reasonable?* > > *2. **Normally, the switch will introduces more than 1.5us switch > time! Using accelio, a mellanox released opensource rdma library, it will > take at least 4 us rtt latency to do simpe ping-pong test. So 5 us MPI > latency (TCP/IP and RoCE) above is rather unbelievable… * > > *3. **The fact that the tcp/ip transport and roce RDMA transport > acquire same latency is so puzzling.. * > > > > > > *Before deeply understanding what happened inside the MPI benchmark, can > show us some suggestion? Does the mpirun command works correctly here? * > > *It must has some mistakes about this test, pls correct me,. * > > > > *Eg: tcp syn&sync ack latency:* > > > > *Thanks * > > *-Yanfei* > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14400.php >