for two separate runs we can use slot_list parameter ( opal_paffinity_base_slot_list ) to have paffinity
1: mpirun -mca opal_paffinity_base_slot_list "0-1" 2 :mpirun -mca opal_paffinity_base_slot_list "2-3" On 7/28/08, Ralph Castain <r...@lanl.gov> wrote: > > Actually, this is true today regardless of this change. If two separate > mpirun invocations share a node and attempt to use paffinity, they will > conflict with each other. The problem isn't caused by the hostfile > sub-allocation. The problem is that the two mpiruns have no knowledge of > each other's actions, and hence assign node ranks to each process > independently. > > Thus, we would have two procs that think they are node rank=0 and should > therefore bind to the 0 processor, and so on up the line. > > Obviously, if you run within one mpirun and have two app_contexts, the > hostfile sub-allocation is fine - mpirun will track node rank across the > app_contexts. It is only the use of multiple mpiruns that share nodes that > causes the problem. > > Several of us have discussed this problem and have a proposed solution for > 1.4. Once we get past 1.3 (someday!), we'll bring it to the group. > > > On Jul 28, 2008, at 10:44 AM, Tim Mattox wrote: > > My only concern is how will this interact with PLPA. >> Say two Open MPI jobs each use "half" the cores (slots) on a >> particular node... how would they be able to bind themselves to >> a disjoint set of cores? I'm not asking you to solve this Ralph, I'm >> just pointing it out so we can maybe warn users that if both jobs sharing >> a node try to use processor affinity, we don't make that magically work >> well, >> and that we would expect it to do quite poorly. >> >> I could see disabling paffinity and/or warning if it was enabled for >> one of these "fractional" nodes. >> >> On Mon, Jul 28, 2008 at 11:43 AM, Ralph Castain <r...@lanl.gov> wrote: >> >>> Per an earlier telecon, I have modified the hostfile behavior slightly to >>> allow hostfiles to subdivide allocations. >>> >>> Briefly: given an allocation, we allow users to specify --hostfile on a >>> per-app_context basis. In this mode, the hostfile info is used to filter >>> the >>> nodes that will be used for that app_context. However, the prior >>> implementation only filtered the nodes themselves - i.e., it was a binary >>> filter that allowed you to include or exclude an entire node. >>> >>> The change now allows you to include a specified #slots for a given node >>> as >>> opposed to -all- slots from that node. You are limited to the #slots >>> included in the original allocation. I just realized that I hadn't output >>> a >>> warning if you attempt to violate this condition - will do so shortly. >>> Rather than just abort if this happens, I set the allocation to that of >>> the >>> original - please let me know if you would prefer it to abort. >>> >>> If you have interest in this behavior, please check it out and let me >>> know >>> if this meets needs. >>> >>> Ralph >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >> >> >> -- >> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ >> tmat...@gmail.com || timat...@open-mpi.org >> I'm a bright... http://www.the-brights.net/ >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >