Lenny's point is true - except for the danger of setting that mca
param and its possible impact on ORTE daemons+mpirun - see other note
in that regard. However, it would only be useful if the same user was
doing it.
I believe Tim was concerned about the case where two users are sharing
nodes. There is no good solution for that case. Two mpiruns done by
different users that share a node and who have no knowledge of the
other's actions will cause collision.
We should probably warn about that in our FAQ or something since that
is a fairly common use-case - only thing I can think of is to
recommend people default to running without affinity and only set it
when they -know- they have sole use of their nodes.
On Jul 29, 2008, at 12:17 AM, Lenny Verkhovsky wrote:
for two separate runs we can use slot_list parameter
( opal_paffinity_base_slot_list ) to have paffinity
1: mpirun -mca opal_paffinity_base_slot_list "0-1"
2 :mpirun -mca opal_paffinity_base_slot_list "2-3"
On 7/28/08, Ralph Castain <r...@lanl.gov> wrote:
Actually, this is true today regardless of this change. If two
separate mpirun invocations share a node and attempt to use
paffinity, they will conflict with each other. The problem isn't
caused by the hostfile sub-allocation. The problem is that the two
mpiruns have no knowledge of each other's actions, and hence assign
node ranks to each process independently.
Thus, we would have two procs that think they are node rank=0 and
should therefore bind to the 0 processor, and so on up the line.
Obviously, if you run within one mpirun and have two app_contexts,
the hostfile sub-allocation is fine - mpirun will track node rank
across the app_contexts. It is only the use of multiple mpiruns that
share nodes that causes the problem.
Several of us have discussed this problem and have a proposed
solution for 1.4. Once we get past 1.3 (someday!), we'll bring it to
the group.
On Jul 28, 2008, at 10:44 AM, Tim Mattox wrote:
My only concern is how will this interact with PLPA.
Say two Open MPI jobs each use "half" the cores (slots) on a
particular node... how would they be able to bind themselves to
a disjoint set of cores? I'm not asking you to solve this Ralph, I'm
just pointing it out so we can maybe warn users that if both jobs
sharing
a node try to use processor affinity, we don't make that magically
work well,
and that we would expect it to do quite poorly.
I could see disabling paffinity and/or warning if it was enabled for
one of these "fractional" nodes.
On Mon, Jul 28, 2008 at 11:43 AM, Ralph Castain <r...@lanl.gov> wrote:
Per an earlier telecon, I have modified the hostfile behavior
slightly to
allow hostfiles to subdivide allocations.
Briefly: given an allocation, we allow users to specify --hostfile
on a
per-app_context basis. In this mode, the hostfile info is used to
filter the
nodes that will be used for that app_context. However, the prior
implementation only filtered the nodes themselves - i.e., it was a
binary
filter that allowed you to include or exclude an entire node.
The change now allows you to include a specified #slots for a given
node as
opposed to -all- slots from that node. You are limited to the #slots
included in the original allocation. I just realized that I hadn't
output a
warning if you attempt to violate this condition - will do so shortly.
Rather than just abort if this happens, I set the allocation to that
of the
original - please let me know if you would prefer it to abort.
If you have interest in this behavior, please check it out and let
me know
if this meets needs.
Ralph
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel