We have used '^' elsewhere to indicate not, so maybe just have the syntax be if you put '^' at the beginning of a line, that node is not used.

So we could have:
n0
n1
^headnode
n3

I understand the idea of having a flag to indicate that all nodes below a certain point should be ignored, but I think this might get confusing, and I'm unsure how useful it would be. I just see the usefulness of this to block out a couple of nodes by default. Besides, if you do want to block out many nodes, any reasonable text editor allows you to insert '^' in front of any number of lines easily.

Alternatively, for the particular situation that Edgar mentions, it may be good enough just to set rmaps_base_no_schedule_local in the mca params default file.

One question though: If I am in a slurm allocation which contains n1, and there is a default hostfile that contains "^n1", will I run on 'n1'?

I'm not sure what the answer is, I know we talked about the precedence earlier...

Tim

Ralph H Castain wrote:
I personally have no objection, but I would ask then that the wiki be
modified to cover this case. All I require is that someone define the syntax
to be used to indicate "this is a node I do -not- want used", or
alternatively a flag that indicates "all nodes below are -not- to be used".

Implementation isn't too hard once I have that...


On 3/3/08 9:44 AM, "Edgar Gabriel" <gabr...@cs.uh.edu> wrote:

Ralph,

could this mechanism be used also to exclude a node, indicating to never
run a job there? Here is the problem that I face quite often: students
working on the homework forget to allocate a partition  on the cluster,
and just type mpirun. Because of that, all jobs end up running on the
front-end node.

If we would have now the ability to specify in a default hostfile, to
never run a job on a specified node (e.g. the front end node), users
would get an error message when trying to do that. I am aware that
that's a little ugly...

THanks
edgar

Ralph Castain wrote:
I forget all the formatting we are supposed to use, so I hope you'll all
just bear with me.

George brought up the fact that we used to have an MCA param to specify a
hostfile to use for a job. The hostfile behavior described on the wiki,
however, doesn't provide for that option. It associates a hostfile with a
specific app_context, and provides a detailed hierarchical layout of how
mpirun is to interpret that information.

What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
to replace the deprecated capability. If found, the system's behavior will
be:

1. in a managed environment, the default hostfile will be used to filter the
discovered nodes to define the available node pool. Any hostfile and/or dash
host options provided to an app_context will be used to further filter the
node pool to define the specific nodes for use by that app_context. Thus,
nodes in the hostfile and dash host options given to an app_context -must-
also be in the default hostfile in order to be available for use by that
app_context - any nodes in the app_context options that are not in the
default hostfile will be ignored.

2. in an unmanaged environment, the default hostfile will be used to define
the available node pool. Any hostfile and/or dash host options provided to
an app_context will be used to filter the node pool to define the specific
nodes for use by that app_context, subject to the previous caveat. However,
add-hostfile and add-host options will add nodes to the node pool for use
-only- by the associated app_context.


I believe this proposed behavior is consistent with that described on the
wiki, and would be relatively easy to implement. If nobody objects, I will
do so by end-of-day 3/6.

Comments, suggestions, objections - all are welcome!
Ralph


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to