Ok, good news. Running the command as root works, so it seems to be a
permissions issue.

Which is weird anyway, never happened on other configurations.

Thank you for your time.

2016-11-16 11:23 GMT+01:00 Gianmario Pozzi <pozzigma...@gmail.com>:

> Hi Pritchard, thank you for replying.
>
> Nothing changed adding the parameter you suggested. Can it depend on the
> fact that I'm running v.1.10.0rc7? It's a custom version, we didn't modify
> spml or sm related code though.
>
> 2016-11-15 14:12 GMT+01:00 Pritchard Jr., Howard <howa...@lanl.gov>:
>
>> HI Gianmario,
>>
>> Probably something went wrong at the spml layer.
>> Could you also add —mac spml_base_verbose 10
>> to the job launch line?
>>
>> Howard
>>
>> --
>> Howard Pritchard
>> HPC-DES
>> Los Alamos National Laboratory
>>
>>
>> From: devel <devel-boun...@lists.open-mpi.org> on behalf of Gianmario
>> Pozzi <pozzigma...@gmail.com>
>> Reply-To: Open MPI Developers <devel@lists.open-mpi.org>
>> Date: Tuesday, November 15, 2016 at 5:32 AM
>> To: "devel@lists.open-mpi.org" <devel@lists.open-mpi.org>
>> Subject: [OMPI devel] Failure while loading shmem module
>>
>> Hi everybody,
>>
>> I'm trying to run a sample program on two 16-cores machines connected
>> with IB (command:  mpirun -np 20 -host *localhost*,*remotehost* --mca
>> shmem_base_verbose 10 --mca btl self,sm,openib test).
>>
>> This command fails saying:
>>
>> [cn18:72296] mca: base: components_register: registering shmem components
>> [cn18:72296] mca: base: components_open: opening shmem components
>> [cn18:72296] shmem: base: runtime_query: Auto-selecting shmem components
>> [cn18:72296] shmem: base: runtime_query: (shmem) No component selected!
>> ------------------------------------------------------------
>> --------------
>> It looks like opal_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during opal_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>>   opal_shmem_base_select failed
>>   --> Returned value -1 instead of OPAL_SUCCESS
>> ------------------------------------------------------------
>> --------------
>>
>> I dove into the code and found out that the cycle contained in that
>> function is not traversed, which apparently means that no suitable
>> component has even been found.
>>
>> Please notice that a sample Hello world application using shared memory
>> runs perfectly. Excluding sm from command line doesn't solve the problem.
>>
>> Any hint? Did any of y'all ever experienced something similar?
>>
>> Thank you.
>> --
>> *Gianmario Pozzi*
>> *M.Sc. @ Politecnico di Milano*
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>
>
>
> --
> *Gianmario Pozzi*
> *M.Sc. @ Politecnico di Milano*
>



-- 
*Gianmario Pozzi*
*M.Sc. @ Politecnico di Milano*
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to