No simple solution, I fear. I know Matias et al are looking at the dynamic 
situation. Getting the number of orted, especially if/when they can be 
dynamically spawned, requires use of the notification method - in the plans, 
but not yet implemented.


> On Sep 29, 2016, at 6:13 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
> 
> This is a follow-up of 
> https://mail-archive.com/users@lists.open-mpi.org/msg30055.html 
> <https://mail-archive.com/users@lists.open-mpi.org/msg30055.html>
> 
> Thanks Matias for the lengthy explanation.
> 
> 
> 
> currently, PSM2_DEVICES is overwritten, so i do not think setting it before 
> invoking mpirun will help
> 
> 
> 
> also, in this specific case
> 
> - the user is running within a SLURM allocation with 2 nodes
> 
> - the user specified a host file with 2 distinct nodes
> 
> 
> 
> my first impression is that mtl/psm2 could/should handle this (well only one 
> condition has to be met) properly and *not* set
> 
> export PSM2_DEVICES="self,shm"
> 
> 
> the patch below
> - does not overwrite PSM2_DEVICES
> - does not set PSM2_DEVICES when num_max_procs > num_total_procs
> this is suboptimal, but i could not find a way to get the number of orted.
> iirc, MPI_Comm_spawn can have an orted dynamically spawned by passing a host 
> in the MPI_Info.
> if this host is not part of the hostfile (nor RM allocation ?), then 
> PSM2_DEVICES must be set manually by the user
> 
> 
> Ralph,
> 
> is there a way to get the number of orted ?
> - if i mpirun -np 1 --host n0,n1 ... orte_process_info.num_nodes is 1 (i wish 
> i could get 2)
> - if running in singleton mode, orte_process_info.num_max_procs is 0 (is this 
> a bug or a feature ?)
> 
> Cheers,
> 
> Gilles
> 
> 
> diff --git a/ompi/mca/mtl/psm2/mtl_psm2_component.c 
> b/ompi/mca/mtl/psm2/mtl_psm2_component.c
> index 26bccd2..52b906b 100644
> --- a/ompi/mca/mtl/psm2/mtl_psm2_component.c
> +++ b/ompi/mca/mtl/psm2/mtl_psm2_component.c
> @@ -14,6 +14,8 @@
>   * Copyright (c) 2012-2015 Los Alamos National Security, LLC.
>   *                         All rights reserved.
>   * Copyright (c) 2013-2016 Intel, Inc. All rights reserved
> + * Copyright (c) 2016      Research Organization for Information Science
> + *                         and Technology (RIST). All rights reserved.
>   * $COPYRIGHT$
>   *
>   * Additional copyrights may follow
> @@ -170,6 +172,13 @@ get_num_total_procs(int *out_ntp)
>  }
>  
>  static int
> +get_num_max_procs(int *out_nmp)
> +{
> +  *out_nmp = (int)ompi_process_info.max_procs;
> +  return OMPI_SUCCESS;
> +}
> +
> +static int
>  get_num_local_procs(int *out_nlp)
>  {
>      /* num_local_peers does not include us in
> @@ -201,7 +210,7 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads,
>      int        verno_major = PSM2_VERNO_MAJOR;
>      int verno_minor = PSM2_VERNO_MINOR;
>      int local_rank = -1, num_local_procs = 0;
> -    int num_total_procs = 0;
> +    int num_total_procs = 0, num_max_procs = 0;
>  
>      /* Compute the total number of processes on this host and our local rank
>       * on that node. We need to provide PSM2 with these values so it can
> @@ -221,6 +230,11 @@ ompi_mtl_psm2_component_init(bool 
> enable_progress_threads,
>                      "Cannot continue.\n");
>          return NULL;
>      }
> +    if (OMPI_SUCCESS != get_num_max_procs(&num_max_procs)) {
> +        opal_output(0, "Cannot determine max number of processes. "
> +                    "Cannot continue.\n");
> +        return NULL;
> +    }
>  
>      err = psm2_error_register_handler(NULL /* no ep */,
>                                      PSM2_ERRHANDLER_NOP);
> @@ -230,8 +244,10 @@ ompi_mtl_psm2_component_init(bool 
> enable_progress_threads,
>         return NULL;
>      }
>  
> -    if (num_local_procs == num_total_procs) {
> -      setenv("PSM2_DEVICES", "self,shm", 0);
> +    if ((num_local_procs == num_total_procs) && (num_max_procs <= 
> num_total_procs)) {
> +        if (NULL == getenv("PSM2_DEVICES")) {
> +            setenv("PSM2_DEVICES", "self,shm", 0);
> +        }
>      }
>  
>      err = psm2_init(&verno_major, &verno_minor);
> 
> 
> 
> 
> 
> 
> 
> On 9/30/2016 12:38 AM, Cabral, Matias A wrote:
>> Hi Giles et.al., <>
>>  
>> You are right, ptl.c is in PSM2 code. As Ralph mentions, dynamic process 
>> support was/is not working in OMPI when using PSM2 because of an issue 
>> related to the transport keys. This was fixed in PR #1602 
>> (https://github.com/open-mpi/ompi/pull/1602 
>> <https://github.com/open-mpi/ompi/pull/1602>) and should be included in 
>> v2.0.2. HOWEVER, this not the error Juraj is seeing. The root of the 
>> assertion is because the PSM/PSM2 MTLs will check for where the “original” 
>> process are running and, if detects all are local to the node, it will ONLY 
>> initialize the shared memory device (variable PSM2_DEVICES="self,shm” ). 
>> This is to avoid “reserving” HW resources in the HFI card that wouldn’t be 
>> used unless you later on spawn ranks in other nodes.  Therefore, to allow 
>> dynamic process to be spawned on other nodes you need to tell PSM2 to 
>> instruct the HW to initialize all the de devices by making the environment 
>> variable PSM2_DEVICES="self,shm,hfi" available before running the job.
>> Note that setting PSM2_DEVICES (*) will solve the below assertion, you will 
>> most likely still see the transport key issue if PR1602 if is not included.
>>  
>> Thanks,
>>  
>> _MAC
>>  
>> (*)
>> PSM2_DEVICES  -> Omni Path
>>                 PSM_DEVICES  -> TrueScale
>>  
>> From: users [mailto:users-boun...@lists.open-mpi.org 
>> <mailto:users-boun...@lists.open-mpi.org>] On Behalf Of r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>
>> Sent: Thursday, September 29, 2016 7:12 AM
>> To: Open MPI Users <us...@lists.open-mpi.org> 
>> <mailto:us...@lists.open-mpi.org>
>> Subject: Re: [OMPI users] MPI_Comm_spawn
>>  
>> Ah, that may be why it wouldn’t show up in the OMPI code base itself. If 
>> that is the case here, then no - OMPI v2.0.1 does not support comm_spawn for 
>> PSM. It is fixed in the upcoming 2.0.2
>>  
>> On Sep 29, 2016, at 6:58 AM, Gilles Gouaillardet 
>> <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote:
>>  
>> Ralph,
>>  
>> My guess is that ptl.c comes from PSM lib ...
>>  
>> Cheers,
>>  
>> Gilles
>> 
>> On Thursday, September 29, 2016, r...@open-mpi.org 
>> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> 
>> wrote:
>> Spawn definitely does not work with srun. I don’t recognize the name of the 
>> file that segfaulted - what is “ptl.c”? Is that in your manager program?
>>  
>>  
>> On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet < 
>> <>gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> 
>> wrote:
>>  
>> Hi,
>>  
>> I do not expect spawn can work with direct launch (e.g. srun)
>>  
>> Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the 
>> failure
>>  
>> Can you please try
>>  
>> mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts 
>> ./manager 1
>>  
>> and see if it help ?
>>  
>> Note if you have the possibility, I suggest you first try that without 
>> slurm, and then within a slurm job
>>  
>> Cheers,
>>  
>> Gilles
>> 
>> On Thursday, September 29, 2016,  <>juraj2...@gmail.com 
>> <mailto:juraj2...@gmail.com> < <>juraj2...@gmail.com 
>> <mailto:juraj2...@gmail.com>> wrote:
>> Hello,
>>  
>> I am using MPI_Comm_spawn to dynamically create new processes from single 
>> manager process. Everything works fine when all the processes are running on 
>> the same node. But imposing restriction to run only a single process per 
>> node does not work. Below are the errors produced during multinode 
>> interactive session and multinode sbatch job.
>>  
>> The system I am using is: Linux version 3.10.0-229.el7.x86_64 
>> (buil...@kbuilder.dev.centos.org <mailto:buil...@kbuilder.dev.centos.org>) 
>> (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) )
>> I am using Open MPI 2.0.1
>> Slurm is version 15.08.9
>>  
>> What is preventing my jobs to spawn on multiple nodes? Does slurm requires 
>> some additional configuration to allow it? Is it issue on the MPI side, does 
>> it need to be compiled with some special flag (I have compiled it with 
>> --enable-mpi-fortran=all --with-pmi)? 
>>  
>> The code I am launching is here: https://github.com/goghino/dynamicMPI 
>> <https://github.com/goghino/dynamicMPI>
>>  
>> Manager tries to launch one new process (./manager 1), the error produced by 
>> requesting each process to be located on different node (interactive 
>> session):
>> $ salloc -N 2
>> $ cat my_hosts
>> icsnode37
>> icsnode38
>> $ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
>> [manager]I'm running MPI 3.1
>> [manager]Runing on node icsnode37
>> icsnode37.12614Assertion failure at ptl.c:183: epaddr == ((void *)0)
>> icsnode38.32443Assertion failure at ptl.c:183: epaddr == ((void *)0)
>> [icsnode37:12614] *** Process received signal ***
>> [icsnode37:12614] Signal: Aborted (6)
>> [icsnode37:12614] Signal code:  (-6)
>> [icsnode38:32443] *** Process received signal ***
>> [icsnode38:32443] Signal: Aborted (6)
>> [icsnode38:32443] Signal code:  (-6)
>>  
>> The same example as above via sbatch job submission:
>> $ cat job.sbatch
>> #!/bin/bash
>>  
>> #SBATCH --nodes=2
>> #SBATCH --ntasks-per-node=1
>>  
>> module load openmpi/2.0.1
>> srun -n 1 -N 1 ./manager 1
>>  
>> $ cat output.o
>> [manager]I'm running MPI 3.1
>> [manager]Runing on node icsnode39
>> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>> [icsnode39:9692] *** An error occurred in MPI_Comm_spawn
>> [icsnode39:9692] *** reported by process [1007812608,0]
>> [icsnode39:9692] *** on communicator MPI_COMM_SELF
>> [icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn processes
>> [icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
>> will now abort,
>> [icsnode39:9692] ***    and potentially your MPI job)
>> In: PMI_Abort(50, N/A)
>> slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT 2016-09-26T16:48:20 
>> ***
>> srun: error: icsnode39: task 0: Exited with exit code 50
>>  
>> Thank for any feedback!
>>  
>> Best regards,
>> Juraj
>> _______________________________________________
>> users mailing list
>>  <>us...@lists.open-mpi.org <mailto:us...@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>  
>> _______________________________________________
>> users mailing list
>> us...@lists.open-mpi.org <mailto:us...@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>  
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@lists.open-mpi.org <mailto:us...@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to