Try adding "--map-by node" to your command line to ensure the procs really
are running on separate nodes.



On Thu, Mar 27, 2014 at 1:40 AM, Wang,Yanfei(SYS) <wangyanfe...@baidu.com>wrote:

>  Hi,
>
>
>
> HW Test Topology:
>
> Ip:192.168.72.4/24 –192.168.72.4/24, enable vlan and RoCE
>
> IB03 server 40G port-- - 40G Ethernet switch ----IB04 server 40G port: 
> configure
> it as RoCE link
>
> IP: 192.168.71.3/24 ---192.168.71.4/24
>
> IB03 server 10G port – 10G Ethernet switch – IB04 server 10G port: configure
> it as normal TCP/IP Ethernet link:(server management interface)
>
>
>
> Mpi configuration:
>
> *MPI Hosts file**:*
>
> [root@bb-nsi-ib04 pt2pt]# cat hosts
>
> ib03 slots=1
>
> ib04 slots=1
>
> *DNS hosts*
>
> [root@bb-nsi-ib04 pt2pt]# cat /etc/hosts
>
> 192.168.71.3 ib03
>
> 192.168.71.4 ib04
>
> [root@bb-nsi-ib04 pt2pt]#
>
> This configuration will create 2 nodes for MPI latency evaluation
>
>
>
> Benchmark:
>
> osu-micro-benchmarks-4.3
>
>
>
> result:
>
> a.       Enable traffic go between 10G TCP/IP port using following
> /etc/hosts file
>
>
>
> root@bb-nsi-ib04 pt2pt]# cat /etc/hosts
>
> 192.168.71.3 ib03
>
> 192.168.71.4 ib04
>
> The average latency is 4.5us of osu_latency, see log following:
>
> [root@bb-nsi-ib04 pt2pt]# mpirun --hostfile hosts -np 2 osu_latency
>
> # OSU MPI Latency Test v4.3
>
> # Size          Latency (us)
>
> 0                       4.56
>
> 1                       4.90
>
> 2                       4.90
>
> 4                       4.60
>
> 8                       4.71
>
> 16                      4.72
>
> 32                      5.40
>
> 64                      4.77
>
> 128                     6.74
>
> 256                     7.01
>
> 512                     7.14
>
> 1024                    7.63
>
> 2048                    8.22
>
> 4096                   10.39
>
> 8192                   14.26
>
> 16384                  20.80
>
> 32768                  31.97
>
> 65536                  37.75
>
> 131072                 47.28
>
> 262144                 80.40
>
> 524288                137.65
>
> 1048576               250.17
>
> 2097152               484.71
>
> 4194304               946.01
>
>
>
> b.       Enable traffic go between RoCE link using /etc/hosts as
> following and mpirun –mca btl openib,self,sm …
>
> [root@bb-nsi-ib04 pt2pt]# cat /etc/hosts
>
> 192.168.72.3 ib03
>
> 192.168.72.4 ib04
>
> Result:
>
> [root@bb-nsi-ib04 pt2pt]# mpirun --hostfile hosts -np 2 --mca btl
> openib,self,sm --mca btl_openib_cpc_include rdmacm osu_latency
>
> # OSU MPI Latency Test v4.3
>
> # Size          Latency (us)
>
> 0                       4.83
>
> 1                       5.17
>
> 2                       5.12
>
> 4                       5.25
>
> 8                       5.38
>
> 16                      5.40
>
> 32                      5.19
>
> 64                      5.04
>
> 128                     6.74
>
> 256                     7.04
>
> 512                     7.34
>
> 1024                    7.91
>
> 2048                    8.17
>
> 4096                   10.39
>
> 8192                   14.22
>
> 16384                  22.05
>
> 32768                  31.68
>
> 65536                  37.57
>
> 131072                 48.25
>
> 262144                 79.98
>
> 524288                137.66
>
> 1048576               251.38
>
> 2097152               485.66
>
> 4194304               947.81
>
> [root@bb-nsi-ib04 pt2pt]#
>
>
>
> *Question:  *
>
> *1.       **Why do they have similar latency, 5us, which is too small to
> believe it! In our test environment, it will take more than 50 us to deal
> with tcp sync and return sync_ack, and also x86 server will take more thans
> 20us at average to do ip forwarding(test from professional HW tester), so
> does the latency is reasonable?*
>
> *2.       **Normally, the switch will introduces more than 1.5us switch
> time! Using accelio, a mellanox released opensource rdma library, it will
> take at least 4 us rtt latency to do simpe ping-pong test. So 5 us MPI
> latency (TCP/IP and RoCE) above is rather unbelievable…  *
>
> *3.       **The fact that the tcp/ip transport and roce RDMA transport
> acquire same latency  is so puzzling..  *
>
>
>
>
>
> *Before deeply understanding what happened inside the MPI benchmark, can
> show us some suggestion? Does the mpirun command works correctly here? *
>
> *It must has some mistakes about this test, pls correct me,. *
>
>
>
> *Eg: tcp syn&sync ack latency:*
>
>
>
> *Thanks *
>
> *-Yanfei*
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/03/14400.php
>

Reply via email to