Hello Guilherme,

the workaround indeed works like a charm:)

As I have this problem with other mpi software like Rmpi, I continued to
investigate.
In my case the mpi subsystem is available in the node where I compile.
It appears that since I'm compiling openmpi with --with-pmi and
--with-slurm, I'm not able to launch any mpi binary in singleton mode (ie
without using mpirun or similar) and this is the problem when easybuild try
to execute a mpi enabled binary without using mpirun. Maybe it's a bug in
OpenMPI as well. I have tried with foss/2017a and it's working fine. But
then I have another problem, Theano supports CUDA but CUDA doesn't support
GCC > 5.

So the question is: What should I do in order to make singleton works with
OMPI provided by foss/2016a or similar. Does other people here using slurm
and pmi have the same issue?

Thanks

2017-02-22 15:43 GMT+01:00 Peretti-Pezzi Guilherme <[email protected]>:

> Hi
>
> We have a similar problem when building Theano on login nodes: MPI
> subsystem is not available, so the sanity check fails.
>
> Our workaround (not approved by Kenneth) is to skip the sanity check, see
> here:
> https://github.com/eth-cscs/production/pull/84/files#diff-
> 4363a39e679c51f12df983df71d3eaa9R33
>
> --
> Cheers,
> G.
>
> From: <[email protected]> on behalf of Yann Sagon <
> [email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Wednesday 22 February 2017 14:38
> To: "[email protected]" <[email protected]>
> Subject: Re: [easybuild] Theano orte_init
>
> No it's not submitted to a scheduler (the scheduler is configured in
> EasyBuild but I'm not using it, ie. not using --job)
>
> 2017-02-22 12:52 GMT+01:00 Robert Schmidt <[email protected]>:
>
>> As a default sanity check for python packages, easybuild will try to
>> import the main module. In this case it seems like the import was enough to
>> trigger an mpi failure.
>>
>> It is a bit weird that the import is enough to cause that problem, but
>> maybe there is something else in the environment that makes it think it is
>> being run as part of an MPI job? Is the build being submitted to a
>> scheduler?
>>
>> On Wed, Feb 22, 2017 at 5:36 AM Yann Sagon <[email protected]> wrote:
>>
>>> Hello,
>>>
>>> I'm back again with my problem (I had a similar problem with Rmpi),
>>> maybe someone has an idea of what is going on.
>>>
>>> I tried to install Theano-0.8.2-foss-2016a-Python-2.7.11.eb and had the
>>> following error:
>>>
>>>
>>> == 2017-02-22 11:19:25,788 build_log.py:147 ERROR EasyBuild crashed with
>>> an error (at ?:124 in __init__): Sanity check failed: Theano failed to
>>> install, cmd '/opt/ebsofts/MPI/GCC/4.9.3-2.
>>> 25/OpenMPI/1.10.2/Python/2.7.11/bin/python -c "import theano"' (stdin:
>>> None) output: ------------------------------
>>> --------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>>   PMI2_Job_GetId failed failed
>>>   --> Returned value (null) (14) instead of ORTE_SUCCESS
>>>
>>> It worth be said that I have compiled OpenMPI with the following
>>> additional flags:
>>>
>>> configopts += '--with-slurm --with-pmi '
>>>
>>> Thanks for any suggestion
>>>
>>> --
>>> Yann SAGON
>>> Ingénieur système HPC
>>> 24 Rue du Général-Dufour
>>> 1211 Genève 4 - Suisse
>>> Tél. : +41 (0)22 379 7737 <+41%2022%20379%2077%2037>
>>> [email protected] - www.unige.ch
>>>
>>
>
>
> --
> Yann SAGON
> Ingénieur système HPC
> 24 Rue du Général-Dufour
> 1211 Genève 4 - Suisse
> Tél. : +41 (0)22 379 7737 <022%20379%2077%2037>
> [email protected] - www.unige.ch
>
>


-- 
Yann SAGON
Ingénieur système HPC
24 Rue du Général-Dufour
1211 Genève 4 - Suisse
Tél. : +41 (0)22 379 7737
[email protected] - www.unige.ch

Reply via email to