It sounds to me that the issue described is similar to https://github.com/STEllAR-GROUP/hpx/pull/3948. Something changed how we retrieve the number of localities (the issue with alps was discovered on 1.3.0).
Hartmut Kaiser <[email protected]> schrieb am Fr., 5. Juli 2019 18:51: > Andy, > > Could you please list your MPI env variables here? The number of localities > should be picked up from them before the parcelport is initialized. > Something goes wrong there, so it assumes that networking should not be > enabled at all... > > Thanks! > Regards Hartmut > --------------- > http://stellar.cct.lsu.edu > https://github.com/STEllAR-GROUP/hpx > > > > -----Original Message----- > > From: [email protected] <hpx-users- > > [email protected]> On Behalf Of Andreas Schäfer > > Sent: Friday, July 5, 2019 5:14 AM > > To: [email protected] > > Subject: Re: [hpx-users] Debugging bad locality number on MPI? > > > > Thanks Thomas! > > > > The MPI environment variables are set up correctly. I dug a bit deeper > and > > figured out that HPX isn't even enabling networking. In > > hpx/src/util/runtime_configuration.cpp the function > > enable_networking() only enables networking if one of the following > > conditions is true: > > > > a) the number of localities is > 1 > > b) the node number (is this the rank?) is > 0 > > c) the number of expected localities is != 0 > > d) the runtime mode is not console > > > > In my case the values are > > a) 1 > > b) -1 > > c) 0 > > d) console > > > > These seem to be correct, since HPX can't know the number of localities > > prior to initializing the MPI parcelport. > > (enable_networking() is run when the parcelhandler is created, but before > > the parcelports are instantiated. > > > > I'll be honest: I don't quite understand the logic there, but if I change > > the code to enable networking in console mode, or if I enable networking > > if the node number equals -1, then it works. > > > > Looks like the this was introduced with https://github.com/STEllAR- > > GROUP/hpx/commit/ffb8470e6a1143e9e1c95c39eff58eec322148d3#diff- > > 86850321552971332a5de979a34bd259 > > > > Thoughts? > > > > Thanks! > > -Andi > > > > > > On 10:45 Fri 05 Jul , Thomas Heller wrote: > > > The relevant parts in the codebase are here: > > > Setting the env to check via cmake: > > > https://github.com/STEllAR-GROUP/hpx/blob/master/plugins/parcelport/mp > > > i/CMakeLists.txt#L42 Detecting if we run inside a MPI environment: > > > https://github.com/STEllAR-GROUP/hpx/blob/master/plugins/parcelport/mp > > > i/mpi_environment.cpp#L30-L52 > > > > > > On Fri, Jul 5, 2019 at 10:41 AM Thomas Heller <[email protected]> > > wrote: > > > > > > > The initialization of the MPI parcelport is done by checking > > > > environment variables set by the most common MPI implementations. > > > > Which MPI implementation do you use? Can you attach the output of > > `mpirun env` maybe? > > > > > > > > Andreas Schäfer <[email protected]> schrieb am Fr., 5. Juli 2019 08:58: > > > > > > > >> > > > >> On 01:11 Fri 05 Jul , Marcin Copik wrote: > > > >> > I've seen such issue in pure MPI applications. The usual reason > > > >> > is using an mpirun/mpiexec provided by an implementation > > > >> > different than the one used for linking. Checking for a mismatch > > there might help. > > > >> > > > >> There is only one MPI version installed on that machine. Also, > > > >> running a simple MPI hello world works as expected. My assumption > > > >> is that the MPI parcelport is not initialized correctly. Which part > > > >> of the code loads/initialized all the parcelports? > > > >> > > > >> Thanks > > > >> -Andi > > > >> > > > >> > > > >> -- > > > >> ========================================================== > > > >> Andreas Schäfer > > > >> > > > >> HPC and Supercomputing for Computer Simulations LibGeoDecomp > > > >> project lead, http://www.libgeodecomp.org > > > >> > > > >> PGP/GPG key via keyserver > > > >> > > > >> I'm an SRE @ Google, this is a private account though. > > > >> All mails are my own and not Google's. > > > >> ========================================================== > > > >> > > > >> (\___/) > > > >> (+'.'+) > > > >> (")_(") > > > >> This is Bunny. Copy and paste Bunny into your signature to help him > > > >> gain world domination! > > > >> _______________________________________________ > > > >> hpx-users mailing list > > > >> [email protected] > > > >> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users > > > >> > > > > > > > > > _______________________________________________ > > > hpx-users mailing list > > > [email protected] > > > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users > > > > > > -- > > ========================================================== > > Andreas Schäfer > > > > HPC and Supercomputing for Computer Simulations LibGeoDecomp project > lead, > > http://www.libgeodecomp.org > > > > PGP/GPG key via keyserver > > > > I'm an SRE @ Google, this is a private account though. > > All mails are my own and not Google's. > > ========================================================== > > > > (\___/) > > (+'.'+) > > (")_(") > > This is Bunny. Copy and paste Bunny into your signature to help him gain > > world domination! > > > _______________________________________________ > hpx-users mailing list > [email protected] > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users >
_______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
