No, ORTE (nor OMPI) makes no such assumption. That's up to the scheduler. We 
will launch a separate orted for each job, though, to avoid cross-contamination

On Jun 19, 2014, at 8:00 AM, Pritchard, Howard P <howa...@lanl.gov> wrote:

> Hi Ralph,
>  
> Thanks for the explanation.  Does ORTE/OMPI always assume that for multi-node 
> jobs,
> there will only be one user’s job/node?    At my previous employer we were 
> having
> to do some changes to runtime components in order to support slurm, for which 
> the customers’
> default settings was to prefer filling of nodes with jobs even if that meant 
> multi-node
> jobs of different users were intermingled within nodes.  The customers did 
> not want
> to have to use the exclusive option.
>  
> Just a heads up if folks who are working on cray xe/xc systems are making 
> assumptions
> that the way things work now with aprun will hold true going forwards.
>  
> Howard
>  
>  
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Wednesday, June 18, 2014 5:00 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] r31916 question
>  
> You know, looking at the code and the comments, the rationale for putting the 
> nids in order was to prep the list for the regex generator. If you look in 
> the plm_ras_module, you'll see that we pass the nodelist to 
> orte_plm_base_orted_append_basic_args. ORNL used static ports for alps to get 
> better scaling, and so that function creates a regular expression from the 
> nodelist. We then pass that to each orted upon launch so it can compute the 
> URI for all other orteds in the system, thus allowing it to connect back to 
> mpirun thru the routing tree (instead of making a direct connection).
>  
> HTH
> Ralph
>  
> On Jun 18, 2014, at 3:55 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> 
> Ah, I see - yes, you'd get_attribute to retrieve it. Alternatively, you have 
> it sitting right there in an array, so you could just use the array to order 
> the list
>  
>  
> On Jun 18, 2014, at 3:47 PM, Pritchard, Howard P <howa...@lanl.gov> wrote:
> 
> 
> Hi Ralph,
>  
> It is setting the attribute, but then for some reason there seems to be a 
> need to have the node ids (nids) in
> ascending order, so there’s some code looking at the old launch_id field, 
> which no longer exists.
>  
> I’m fixing it.  I’d like to learn the cycle of getting fixes in to trunk.
>  
> Thanks,
>  
> Howard
>  
>  
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Wednesday, June 18, 2014 4:45 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] r31916 question
>  
> Huh - thought I got that. Sorry I missed it. Let me take a look and ensure 
> that the alps ras module is setting that attribute
>  
> On Jun 18, 2014, at 2:40 PM, Pritchard, Howard P <howa...@lanl.gov> wrote:
> 
> 
> 
> Hello Folks,
>  
> I’m looking at commit 31916 and notice a lot of fields were remote from 
> orte_node_t.
> This is now preventing ras_alps_module.c from compiling owing to use of a 
> “launch_id”
> field.
>  
> In lieu of the direct use of launch_id, should I replace the code around 587 
> of this file with
> use of orte_get_attribute with ORTE_NODE_LAUNCH_ID for the attribute to be 
> retrieved?
>  
> Thanks,
>  
> Howard
>  
>  
> -------------------------------------------------
> Howard Pritchard
> HPC-5
> Los Alamos National Laboratory
>  
>  
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15008.php
>  
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15010.php
>  
>  
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15017.php

Reply via email to