HI Gianmario,

Probably something went wrong at the spml layer.
Could you also add —mac spml_base_verbose 10
to the job launch line?

Howard

--
Howard Pritchard
HPC-DES
Los Alamos National Laboratory


From: devel 
<devel-boun...@lists.open-mpi.org<mailto:devel-boun...@lists.open-mpi.org>> on 
behalf of Gianmario Pozzi <pozzigma...@gmail.com<mailto:pozzigma...@gmail.com>>
Reply-To: Open MPI Developers 
<devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>>
Date: Tuesday, November 15, 2016 at 5:32 AM
To: "devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>" 
<devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>>
Subject: [OMPI devel] Failure while loading shmem module

Hi everybody,

I'm trying to run a sample program on two 16-cores machines connected with IB 
(command:  mpirun -np 20 -host localhost,remotehost --mca shmem_base_verbose 10 
--mca btl self,sm,openib test).

This command fails saying:

[cn18:72296] mca: base: components_register: registering shmem components
[cn18:72296] mca: base: components_open: opening shmem components
[cn18:72296] shmem: base: runtime_query: Auto-selecting shmem components
[cn18:72296] shmem: base: runtime_query: (shmem) No component selected!
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------

I dove into the code and found out that the cycle contained in that function is 
not traversed, which apparently means that no suitable component has even been 
found.

Please notice that a sample Hello world application using shared memory runs 
perfectly. Excluding sm from command line doesn't solve the problem.

Any hint? Did any of y'all ever experienced something similar?

Thank you.
--
Gianmario Pozzi
M.Sc. @ Politecnico di Milano
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to