Ok, good news. Running the command as root works, so it seems to be a permissions issue.
Which is weird anyway, never happened on other configurations. Thank you for your time. 2016-11-16 11:23 GMT+01:00 Gianmario Pozzi <pozzigma...@gmail.com>: > Hi Pritchard, thank you for replying. > > Nothing changed adding the parameter you suggested. Can it depend on the > fact that I'm running v.1.10.0rc7? It's a custom version, we didn't modify > spml or sm related code though. > > 2016-11-15 14:12 GMT+01:00 Pritchard Jr., Howard <howa...@lanl.gov>: > >> HI Gianmario, >> >> Probably something went wrong at the spml layer. >> Could you also add —mac spml_base_verbose 10 >> to the job launch line? >> >> Howard >> >> -- >> Howard Pritchard >> HPC-DES >> Los Alamos National Laboratory >> >> >> From: devel <devel-boun...@lists.open-mpi.org> on behalf of Gianmario >> Pozzi <pozzigma...@gmail.com> >> Reply-To: Open MPI Developers <devel@lists.open-mpi.org> >> Date: Tuesday, November 15, 2016 at 5:32 AM >> To: "devel@lists.open-mpi.org" <devel@lists.open-mpi.org> >> Subject: [OMPI devel] Failure while loading shmem module >> >> Hi everybody, >> >> I'm trying to run a sample program on two 16-cores machines connected >> with IB (command: mpirun -np 20 -host *localhost*,*remotehost* --mca >> shmem_base_verbose 10 --mca btl self,sm,openib test). >> >> This command fails saying: >> >> [cn18:72296] mca: base: components_register: registering shmem components >> [cn18:72296] mca: base: components_open: opening shmem components >> [cn18:72296] shmem: base: runtime_query: Auto-selecting shmem components >> [cn18:72296] shmem: base: runtime_query: (shmem) No component selected! >> ------------------------------------------------------------ >> -------------- >> It looks like opal_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during opal_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> opal_shmem_base_select failed >> --> Returned value -1 instead of OPAL_SUCCESS >> ------------------------------------------------------------ >> -------------- >> >> I dove into the code and found out that the cycle contained in that >> function is not traversed, which apparently means that no suitable >> component has even been found. >> >> Please notice that a sample Hello world application using shared memory >> runs perfectly. Excluding sm from command line doesn't solve the problem. >> >> Any hint? Did any of y'all ever experienced something similar? >> >> Thank you. >> -- >> *Gianmario Pozzi* >> *M.Sc. @ Politecnico di Milano* >> >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> > > > > -- > *Gianmario Pozzi* > *M.Sc. @ Politecnico di Milano* > -- *Gianmario Pozzi* *M.Sc. @ Politecnico di Milano*
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel