Okay, I added the warning here: https://github.com/open-mpi/ompi/pull/3778 
<https://github.com/open-mpi/ompi/pull/3778>

This is what it looks like for SLURM (slightly different error message for 
ALPS):

$ srun -n 1 ./mpi_spin
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[rhc001:189810] Local abort before MPI_INIT completed completed successfully, 
but am not able to aggregate error messages, and not able to guarantee that all 
other processes were killed!
srun: error: rhc001: task 0: Exited with exit code 1
$


> On Jun 19, 2017, at 9:35 PM, Barrett, Brian via devel 
> <devel@lists.open-mpi.org> wrote:
> 
> By the way, there was a change between 2.x and 3.0.x:
> 
> 2.x:
> 
> Hello, world, I am 0 of 1, (Open MPI v2.1.2a1, package: Open MPI 
> bbarrett@ip-172-31-64-10 Distribution, ident: 2.1.2a1, repo rev: 
> v2.1.1-59-gdc049e4, Unreleased developer copy, 148)
> Hello, world, I am 0 of 1, (Open MPI v2.1.2a1, package: Open MPI 
> bbarrett@ip-172-31-64-10 Distribution, ident: 2.1.2a1, repo rev: 
> v2.1.1-59-gdc049e4, Unreleased developer copy, 148)
> 
> 
> 3.0.x:
> 
> % srun  -n 2 ./hello_c
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [ip-172-31-64-100:72545] Local abort before MPI_INIT completed completed 
> successfully, but am not able to aggregate error messages, and not able to 
> guarantee that all other processes were killed!
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [ip-172-31-64-100:72546] Local abort before MPI_INIT completed completed 
> successfully, but am not able to aggregate error messages, and not able to 
> guarantee that all other processes were killed!
> srun: error: ip-172-31-64-100: tasks 0-1: Exited with exit code 1
> 
> Don’t think it really matters, since v2.x probably wasn’t what the customer 
> wanted.
> 
> Brian
> 
>> On Jun 19, 2017, at 7:18 AM, Howard Pritchard <hpprit...@gmail.com 
>> <mailto:hpprit...@gmail.com>> wrote:
>> 
>> Hi Ralph
>> 
>> I think the alternative you mention below should suffice.
>> 
>> Howard
>> 
>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>> schrieb am Mo. 19. Juni 2017 um 07:24:
>> So what you guys want is for me to detect that no opal/pmix framework 
>> components could run, detect that we are in a slurm job, and so print out an 
>> error message saying “hey dummy - you didn’t configure us with slurm pmi 
>> support”?
>> 
>> It means embedding slurm job detection code in the heart of ORTE (as opposed 
>> to in a component), which bothers me a bit.
>> 
>> As an alternative, what if I print out a generic “you didn’t configure us 
>> with pmi support for this environment” instead of the “pmix select failed” 
>> message? I can mention how to configure the support in a general way, but it 
>> avoids having to embed slurm detection into ORTE outside of a component.
>> 
>> > On Jun 16, 2017, at 8:39 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com 
>> > <mailto:jsquy...@cisco.com>> wrote:
>> >
>> > +1 on the error message.
>> >
>> >
>> >
>> >> On Jun 16, 2017, at 10:06 AM, Howard Pritchard <hpprit...@gmail.com 
>> >> <mailto:hpprit...@gmail.com>> wrote:
>> >>
>> >> Hi Ralph
>> >>
>> >> I think a helpful  error message would suffice.
>> >>
>> >> Howard
>> >>
>> >> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>> >> <mailto:r...@open-mpi.org>> schrieb am Di. 13. Juni 2017 um 11:15:
>> >> Hey folks
>> >>
>> >> Brian brought this up today on the call, so I spent a little time 
>> >> investigating. After installing SLURM 17.02 (with just --prefix as config 
>> >> args), I configured OMPI with just --prefix config args. Getting an 
>> >> allocation and then executing “srun ./hello” failed, as expected.
>> >>
>> >> However, configuring OMPI --with-pmi=<path-to-slurm> resolved the 
>> >> problem. SLURM continues to default to PMI-1, and so we pick that option 
>> >> up and use it. Everything works fine.
>> >>
>> >> FWIW: I also went back and checked using SLURM 15.08 and got the 
>> >> identical behavior.
>> >>
>> >> So the issue is: we don’t pick up PMI support by default, and never have 
>> >> due to the SLURM license issue. Thus, we have always required that the 
>> >> user explicitly configure --with-pmi so they take responsibility for the 
>> >> license. This is an acknowledged way of avoiding having GPL pull OMPI 
>> >> under its umbrella as it is the user, and not the OMPI community, that is 
>> >> making the link.
>> >>
>> >> I’m not sure there is anything we can or should do about this, other than 
>> >> perhaps providing a nicer error message. Thoughts?
>> >> Ralph
>> >>
>> >> _______________________________________________
>> >> devel mailing list
>> >> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>> >> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>> >> _______________________________________________
>> >> devel mailing list
>> >> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>> >> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>> >
>> >
>> > --
>> > Jeff Squyres
>> > jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>> > <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>_______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to