Did a little digging into this last night, and finally figured out what
you were getting at in your comments here. Yeah, I think an "affinity"
framework would definitely be the best approach - can handle both cpu
and memory, I  imagine. Isn't clear how pressing that is as it is mostly
an optimization issue, but you're welcome to create the framework if you
like.


On Sun, 2005-07-17 at 09:13, Jeff Squyres wrote:

> It needs to be done in the launched process itself.  So we'd either 
> have to extend rmaps (from my understanding of rmaps, that doesn't seem 
> like a good idea), or do something different.
> 
> Perhaps the easiest thing to do is to add this to the LANL meeting 
> agenda...?  Then we can have a whiteboard to discuss.  :-)
> 
> 
> 
> On Jul 17, 2005, at 10:26 AM, Ralph Castain wrote:
> 
> > Wouldn't it belong in the rmaps framework? That's where we tell the
> > launcher where to put each process - seems like a natural fit.
> >
> >
> > On Jul 17, 2005, at 6:45 AM, Jeff Squyres wrote:
> >
> >> I'm thinking that we should add some processor affinity code to OMPI 
> >> --
> >> possibly in the orte layer (ORTE is the interface to the back-end
> >> launcher, after all).  This will really help on systems like opterons
> >> (and others) to prevent processes from bouncing between processors, 
> >> and
> >> potentially getting located far from "their" RAM.
> >>
> >> This has the potential to help even micro-benchmark results (e.g.,
> >> ping-pong).  It's going to be quite relevant for my shared memory
> >> collective work on mauve.
> >>
> >>
> >> General scheme:
> >> ---------------
> >>
> >> I think that somewhere in ORTE, we should actively set processor
> >> affinity when:
> >>    - Supported by the OS
> >>    - Not disabled by the user (via MCA param)
> >>    - The node is not over-subscribed with processes from this job
> >>
> >> Generally speaking, if you launch <=N processes in a job on a node
> >> (where N == number of CPUs on that node), then we set processor
> >> affinity.  We set each process's affinity to the CPU number according
> >> to the VPID ordering of the procs in that job on that node.  So if you
> >> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
> >> go to processor 1, etc. (it's an easy, locally-determined ordering).
> >>
> >> Someday, we might want to make this scheme universe-aware (i.e., see 
> >> if
> >> any other ORTE jobs are running on that node, and not schedule on any
> >> processors that are already claimed by the processes on that(those)
> >> job(s)), but I think single-job awareness is sufficient for the 
> >> moment.
> >>
> >>
> >> Implementation:
> >> ---------------
> >>
> >> We'll need relevant configure tests to figure out if the target system
> >> as CPU affinity system calls.  Those are simple to add.
> >>
> >> We could use simply #if statements for the affinity stuff or make it a
> >> real framework.  Since it's only 1 function call to set the affinity, 
> >> I
> >> tend to lean towards the [simpler] #if solution, but could probably be
> >> pretty easily convinced that a framework is the Right solution.  I'm 
> >> on
> >> the fence (and if someone convinces me, I'd volunteer for the extra
> >> work to setup the framework).
> >>
> >> I'm not super-familiar with the processor-affinity stuff (e.g., for
> >> best effect, should it be done after the fork and before the exec?), 
> >> so
> >> I'm not sure exactly where this would go in ORTE.  Potentially either
> >> before new processes are exec'd (where we only have control of that in
> >> some kinds of systems, like rsh/ssh) or right up very very near the 
> >> top
> >> of orte_init().
> >>
> >> Comments?
> >>
> >> -- 
> >> {+} Jeff Squyres
> >> {+} The Open MPI Project
> >> {+} http://www.open-mpi.org/
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >

Reply via email to