No, ORTE (nor OMPI) makes no such assumption. That's up to the scheduler. We will launch a separate orted for each job, though, to avoid cross-contamination
On Jun 19, 2014, at 8:00 AM, Pritchard, Howard P <howa...@lanl.gov> wrote: > Hi Ralph, > > Thanks for the explanation. Does ORTE/OMPI always assume that for multi-node > jobs, > there will only be one user’s job/node? At my previous employer we were > having > to do some changes to runtime components in order to support slurm, for which > the customers’ > default settings was to prefer filling of nodes with jobs even if that meant > multi-node > jobs of different users were intermingled within nodes. The customers did > not want > to have to use the exclusive option. > > Just a heads up if folks who are working on cray xe/xc systems are making > assumptions > that the way things work now with aprun will hold true going forwards. > > Howard > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Wednesday, June 18, 2014 5:00 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] r31916 question > > You know, looking at the code and the comments, the rationale for putting the > nids in order was to prep the list for the regex generator. If you look in > the plm_ras_module, you'll see that we pass the nodelist to > orte_plm_base_orted_append_basic_args. ORNL used static ports for alps to get > better scaling, and so that function creates a regular expression from the > nodelist. We then pass that to each orted upon launch so it can compute the > URI for all other orteds in the system, thus allowing it to connect back to > mpirun thru the routing tree (instead of making a direct connection). > > HTH > Ralph > > On Jun 18, 2014, at 3:55 PM, Ralph Castain <r...@open-mpi.org> wrote: > > > Ah, I see - yes, you'd get_attribute to retrieve it. Alternatively, you have > it sitting right there in an array, so you could just use the array to order > the list > > > On Jun 18, 2014, at 3:47 PM, Pritchard, Howard P <howa...@lanl.gov> wrote: > > > Hi Ralph, > > It is setting the attribute, but then for some reason there seems to be a > need to have the node ids (nids) in > ascending order, so there’s some code looking at the old launch_id field, > which no longer exists. > > I’m fixing it. I’d like to learn the cycle of getting fixes in to trunk. > > Thanks, > > Howard > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Wednesday, June 18, 2014 4:45 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] r31916 question > > Huh - thought I got that. Sorry I missed it. Let me take a look and ensure > that the alps ras module is setting that attribute > > On Jun 18, 2014, at 2:40 PM, Pritchard, Howard P <howa...@lanl.gov> wrote: > > > > Hello Folks, > > I’m looking at commit 31916 and notice a lot of fields were remote from > orte_node_t. > This is now preventing ras_alps_module.c from compiling owing to use of a > “launch_id” > field. > > In lieu of the direct use of launch_id, should I replace the code around 587 > of this file with > use of orte_get_attribute with ORTE_NODE_LAUNCH_ID for the attribute to be > retrieved? > > Thanks, > > Howard > > > ------------------------------------------------- > Howard Pritchard > HPC-5 > Los Alamos National Laboratory > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15008.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15010.php > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15017.php