update:

It's probably a bug. Trying with foss/2016b or upper is working fine.
The strange thing is that I have the same openmpi as foss/2016a (1.10.2)
compiled with gcc 4.4.7 (the one from Centos) and with this version it's
working.

Conclusion: I'll begin to use foss/2016b and foss/2017a

Best

2017-02-23 11:13 GMT+01:00 Yann Sagon <[email protected]>:

> Hello Guilherme,
>
> the workaround indeed works like a charm:)
>
> As I have this problem with other mpi software like Rmpi, I continued to
> investigate.
> In my case the mpi subsystem is available in the node where I compile.
> It appears that since I'm compiling openmpi with --with-pmi and
> --with-slurm, I'm not able to launch any mpi binary in singleton mode (ie
> without using mpirun or similar) and this is the problem when easybuild try
> to execute a mpi enabled binary without using mpirun. Maybe it's a bug in
> OpenMPI as well. I have tried with foss/2017a and it's working fine. But
> then I have another problem, Theano supports CUDA but CUDA doesn't support
> GCC > 5.
>
> So the question is: What should I do in order to make singleton works with
> OMPI provided by foss/2016a or similar. Does other people here using slurm
> and pmi have the same issue?
>
> Thanks
>
> 2017-02-22 15:43 GMT+01:00 Peretti-Pezzi Guilherme <[email protected]>:
>
>> Hi
>>
>> We have a similar problem when building Theano on login nodes: MPI
>> subsystem is not available, so the sanity check fails.
>>
>> Our workaround (not approved by Kenneth) is to skip the sanity check, see
>> here:
>> https://github.com/eth-cscs/production/pull/84/files#diff-43
>> 63a39e679c51f12df983df71d3eaa9R33
>>
>> --
>> Cheers,
>> G.
>>
>> From: <[email protected]> on behalf of Yann Sagon <
>> [email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Wednesday 22 February 2017 14:38
>> To: "[email protected]" <[email protected]>
>> Subject: Re: [easybuild] Theano orte_init
>>
>> No it's not submitted to a scheduler (the scheduler is configured in
>> EasyBuild but I'm not using it, ie. not using --job)
>>
>> 2017-02-22 12:52 GMT+01:00 Robert Schmidt <[email protected]>:
>>
>>> As a default sanity check for python packages, easybuild will try to
>>> import the main module. In this case it seems like the import was enough to
>>> trigger an mpi failure.
>>>
>>> It is a bit weird that the import is enough to cause that problem, but
>>> maybe there is something else in the environment that makes it think it is
>>> being run as part of an MPI job? Is the build being submitted to a
>>> scheduler?
>>>
>>> On Wed, Feb 22, 2017 at 5:36 AM Yann Sagon <[email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm back again with my problem (I had a similar problem with Rmpi),
>>>> maybe someone has an idea of what is going on.
>>>>
>>>> I tried to install Theano-0.8.2-foss-2016a-Python-2.7.11.eb and had
>>>> the following error:
>>>>
>>>>
>>>> == 2017-02-22 11:19:25,788 build_log.py:147 ERROR EasyBuild crashed
>>>> with an error (at ?:124 in __init__): Sanity check failed: Theano failed to
>>>> install, cmd '/opt/ebsofts/MPI/GCC/4.9.3-2.
>>>> 25/OpenMPI/1.10.2/Python/2.7.11/bin/python -c "import theano"' (stdin:
>>>> None) output: ------------------------------
>>>> --------------------------------------------
>>>> It looks like orte_init failed for some reason; your parallel process is
>>>> likely to abort.  There are many reasons that a parallel process can
>>>> fail during orte_init; some of which are due to configuration or
>>>> environment problems.  This failure appears to be an internal failure;
>>>> here's some additional information (which may only be relevant to an
>>>> Open MPI developer):
>>>>
>>>>   PMI2_Job_GetId failed failed
>>>>   --> Returned value (null) (14) instead of ORTE_SUCCESS
>>>>
>>>> It worth be said that I have compiled OpenMPI with the following
>>>> additional flags:
>>>>
>>>> configopts += '--with-slurm --with-pmi '
>>>>
>>>> Thanks for any suggestion
>>>>
>>>> --
>>>> Yann SAGON
>>>> Ingénieur système HPC
>>>> 24 Rue du Général-Dufour
>>>> 1211 Genève 4 - Suisse
>>>> Tél. : +41 (0)22 379 7737 <+41%2022%20379%2077%2037>
>>>> [email protected] - www.unige.ch
>>>>
>>>
>>
>>
>> --
>> Yann SAGON
>> Ingénieur système HPC
>> 24 Rue du Général-Dufour
>> 1211 Genève 4 - Suisse
>> Tél. : +41 (0)22 379 7737 <022%20379%2077%2037>
>> [email protected] - www.unige.ch
>>
>>
>
>
> --
> Yann SAGON
> Ingénieur système HPC
> 24 Rue du Général-Dufour
> 1211 Genève 4 - Suisse
> Tél. : +41 (0)22 379 7737 <022%20379%2077%2037>
> [email protected] - www.unige.ch
>



-- 
Yann SAGON
Ingénieur système HPC
24 Rue du Général-Dufour
1211 Genève 4 - Suisse
Tél. : +41 (0)22 379 7737
[email protected] - www.unige.ch

Reply via email to