This looks like an Open MPI-specific question (I barely monitor this email
list; I only saw this post by pure chance).
Can you ping us over on the Open MPI mailing list with this question? There's
more people that can help you there.
http://www.open-mpi.org/community/lists/ompi.php
Thanks!
On Feb 26, 2011, at 4:05 PM, Jagga Soorma wrote:
> Hello,
>
> I am running into the following issue while trying to run osu_latency:
>
> --
> -bash-3.2$ mpiexec --mca btl openib,self -mca btl_openib_warn_default_gid_
> prefix 0 -np 2 --hostfile mpihosts
> /home/jagga/osu-micro-benchmarks-3.3/openmpi/ofed-1.5.2/bin/osu_latency
> # OSU MPI Latency Test v3.3
> # Size Latency (us)
> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all]
> error modifing QP to RTR errno says Invalid argument
> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:815:rml_recv_cb]
> error in endpoint reply start connect
> --------------------------------------------------------------------------
> mpiexec has exited due to process rank 1 with PID 6781 on
> node amber04 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpiexec (as reported here).
> --------------------------------------------------------------------------
> --
>
> I can get around this by adding the "--mca btl_openib_cpc_include rdmacm"
> option. However, I have another host with a different HCA with all the same
> drivers and software versions that I can run this same command successfully
> with using the rdmacm option. What could be causing one of my environments
> to fail but the other to work fine (without the rdmacm option)?
>
> --
> [root@amber03 ~]# ofed_info | grep OFED
> MLNX_OFED_LINUX-1.5.2-1.0.0 (OFED-1.5.2-20101020-1520):
> MLNX_OFED_LINUX-1.5.2-1.0.0
> (/mswg/release/ofed-1.5.2-rpms/rnfs-utils/rnfs-utils-1.1.5-10.OFED.src.rpm):
>
> [root@amber03 ~]# ibv_devinfo
> hca_id: mlx4_0
> transport: InfiniBand (0)
> fw_ver: 2.7.9294
> node_guid: 78e7:d103:0021:8884
> sys_image_guid: 78e7:d103:0021:8887
> vendor_id: 0x02c9
> vendor_part_id: 26438
> hw_ver: 0xB0
> board_id: HP_0200000003
> phys_port_cnt: 2
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 1
> port_lid: 20
> port_lmc: 0x00
> link_layer: IB
>
> port: 2
> state: PORT_ACTIVE (4)
> max_mtu: 2048 (4)
> active_mtu: 1024 (3)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
> link_layer: Ethernet
> --
>
> Any help would be greatly appreciated.
>
> Thanks,
> -J
> _______________________________________________
> ewg mailing list
> [email protected]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
--
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
ewg mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg