HI Gianmario, Probably something went wrong at the spml layer. Could you also add —mac spml_base_verbose 10 to the job launch line?
Howard -- Howard Pritchard HPC-DES Los Alamos National Laboratory From: devel <devel-boun...@lists.open-mpi.org<mailto:devel-boun...@lists.open-mpi.org>> on behalf of Gianmario Pozzi <pozzigma...@gmail.com<mailto:pozzigma...@gmail.com>> Reply-To: Open MPI Developers <devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>> Date: Tuesday, November 15, 2016 at 5:32 AM To: "devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>" <devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>> Subject: [OMPI devel] Failure while loading shmem module Hi everybody, I'm trying to run a sample program on two 16-cores machines connected with IB (command: mpirun -np 20 -host localhost,remotehost --mca shmem_base_verbose 10 --mca btl self,sm,openib test). This command fails saying: [cn18:72296] mca: base: components_register: registering shmem components [cn18:72296] mca: base: components_open: opening shmem components [cn18:72296] shmem: base: runtime_query: Auto-selecting shmem components [cn18:72296] shmem: base: runtime_query: (shmem) No component selected! -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- I dove into the code and found out that the cycle contained in that function is not traversed, which apparently means that no suitable component has even been found. Please notice that a sample Hello world application using shared memory runs perfectly. Excluding sm from command line doesn't solve the problem. Any hint? Did any of y'all ever experienced something similar? Thank you. -- Gianmario Pozzi M.Sc. @ Politecnico di Milano
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel