Hi everybody,

I'm trying to run a sample program on two 16-cores machines connected with
IB (command:  mpirun -np 20 -host *localhost*,*remotehost* --mca
shmem_base_verbose 10 --mca btl self,sm,openib test).

This command fails saying:

[cn18:72296] mca: base: components_register: registering shmem components
[cn18:72296] mca: base: components_open: opening shmem components
[cn18:72296] shmem: base: runtime_query: Auto-selecting shmem components
[cn18:72296] shmem: base: runtime_query: (shmem) No component selected!
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------

I dove into the code and found out that the cycle contained in that
function is not traversed, which apparently means that no suitable
component has even been found.

Please notice that a sample Hello world application using shared memory
runs perfectly. Excluding sm from command line doesn't solve the problem.

Any hint? Did any of y'all ever experienced something similar?

Thank you.
-- 
*Gianmario Pozzi*
*M.Sc. @ Politecnico di Milano*
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to