From time-to-time, and have a need for running Open MPI apps using the openib btl on a single node, where port 1 on the HCA is connected to port 2 on the same HCA.
Using a vintage 1.5.4, my command line would read: mpiexec --mca btl self,openib --mca btl_openib_cpc_include oob \ -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:1 ./a.out : \ -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:2 ./a.out Now, I had a need for a newer Open MPI, and compiled and installed version 1.8.2. Now the problems began ;-) Apparently, the old (and in my opinion nice)"oob" connection management method has disappeared. However, by modifying the command line to: mpiexec --mca btl self,openib --mca btl_openib_cpc_include udcm \ -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:1 ./a.out : \ -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:2 ./a.out I get tons of: connect/btl_openib_connect_udcm.c:1390:udcm_find_endpoint] could not find endpoint with port: 1, lid: 4608, msg_type: 100 Interestingly, the lid here is the lid for Port 2 (when port numbers start at 1). I do suspect that the printout above counts ports from zero. Anyway, must I get back to an older Open MPI supporting "oob", or do I have a flaw in my command line? Thanks, Håkon