Hi, I was running IMB these days and noticed that Open MPI refuses to use the Infiniband interconnects. I compiled Open MPI with
../configure --prefix=... CC=/opt/intel/cce/10.1.015/bin/icc CXX=/opt/intel/cce/10.1.015/bin/icpc CPP="/opt/intel/cce/10.1.015/bin/icc -E" FC=/opt/intel/fce/10.1.015/bin/ifort F90=/opt/intel/fce/10.1.015/bin/ifort F77=/opt/intel/fce/10.1.015/bin/ifort --enable-mpi-f90 --with-tm=/usr/pbs/ --enable-mpi-threads=yes --enable-contrib-no-build=vt --with-openib=/usr/ However, I could never get Infiniband to be used : mpirun --mca btl openib,self,sm -np 2 --bynode /home_nfs/home_dichevk/tests/IMB/IMB_3.1/src-OpenMPI/IMB-MPI1 -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[5881,1],0]) is on host: nv11 Process 2 ([[5881,1],1]) is on host: nv12 BTLs attempted: self sm Your MPI job is now going to abort; sorry. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [nv11:17093] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init_thread *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [nv12:24383] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 17093 on node nv11 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [nv11:17092] 1 more process has sent help message help-mca-bml-r2.txt / unreachable proc [nv11:17092] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [nv11:17092] 1 more process has sent help message help-mpi-runtime / mpi_init:startup:internal-failure Then I noticed some code and comments in ompi/mca/btl/openib/btl_openib_component.c which seem to disable this component when MPI_THREAD_MULTIPLE is used for the initialization (as is the case with IMB). Is that intentional ? Best regards, Kiril -- Dipl.-Inf. Kiril Dichev Tel.: +49 711 685 60492 E-mail: dic...@hlrs.de High Performance Computing Center Stuttgart (HLRS) Universität Stuttgart 70550 Stuttgart Germany