Hi

We have a similar problem when building Theano on login nodes: MPI subsystem is 
not available, so the sanity check fails.

Our workaround (not approved by Kenneth) is to skip the sanity check, see here:
https://github.com/eth-cscs/production/pull/84/files#diff-4363a39e679c51f12df983df71d3eaa9R33

--
Cheers,
G.

From: 
<[email protected]<mailto:[email protected]>> on 
behalf of Yann Sagon <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday 22 February 2017 14:38
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [easybuild] Theano orte_init

No it's not submitted to a scheduler (the scheduler is configured in EasyBuild 
but I'm not using it, ie. not using --job)

2017-02-22 12:52 GMT+01:00 Robert Schmidt 
<[email protected]<mailto:[email protected]>>:
As a default sanity check for python packages, easybuild will try to import the 
main module. In this case it seems like the import was enough to trigger an mpi 
failure.

It is a bit weird that the import is enough to cause that problem, but maybe 
there is something else in the environment that makes it think it is being run 
as part of an MPI job? Is the build being submitted to a scheduler?

On Wed, Feb 22, 2017 at 5:36 AM Yann Sagon 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

I'm back again with my problem (I had a similar problem with Rmpi), maybe 
someone has an idea of what is going on.

I tried to install Theano-0.8.2-foss-2016a-Python-2.7.11.eb and had the 
following error:


== 2017-02-22 11:19:25,788 build_log.py:147 ERROR EasyBuild crashed with an 
error (at ?:124 in __init__): Sanity check failed: Theano failed to install, 
cmd '/opt/ebsofts/MPI/GCC/4.9.3-2.25/OpenMPI/1.10.2/Python/2.7.11/bin/python -c 
"import theano"' (stdin: None) output: 
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  PMI2_Job_GetId failed failed
  --> Returned value (null) (14) instead of ORTE_SUCCESS

It worth be said that I have compiled OpenMPI with the following additional 
flags:

configopts += '--with-slurm --with-pmi '

Thanks for any suggestion

--
Yann SAGON
Ingénieur système HPC

24 Rue du Général-Dufour
1211 Genève 4 - Suisse
Tél. : +41 (0)22 379 7737<tel:+41%2022%20379%2077%2037>
[email protected]<mailto:[email protected]> - 
www.unige.ch<http://www.unige.ch>




--
Yann SAGON
Ingénieur système HPC

24 Rue du Général-Dufour
1211 Genève 4 - Suisse
Tél. : +41 (0)22 379 7737
[email protected]<mailto:[email protected]> - 
www.unige.ch<http://www.unige.ch>

Reply via email to