just an idea, maybe it is worse to provide brand new cmd line option
to mpirun. This option will accept filename and support combined
syntax for machinefile/hostfile (to define allocations) and rankfile
(to define placement).
YAML syntax can be used in order to describe file primitives
(http://www.yaml.org/start.html)
for example:
$ mpirun -clusterfile /path/to/clusterfile
$ cat clusterfile
hostX:
slots : int
maxslots : int
ranks : rankid[@socket:core]
example of clusterfile
===============
hostX:
slots : 4
maxslots : 4
ranks : 1,16,22
hostY:
slots : 8
maxslots : 8
ranks : 1@0:*, 3@2-3, 4@0:1, 5
By doing so, we keep backwards compatability.
after reading clusterfile, code should perform *hostfile* and
*rankfile* parts as today.
what do you think?
Mike
On Mon, Jun 22, 2009 at 1:30 PM, Terry Dontje <terry.don...@sun.com
<mailto:terry.don...@sun.com>> wrote:
Let us think about this some more. We'll try and reply later today.
--td
Ralph Castain wrote:
Had a chance to think about how this might be done, and looked
at it for awhile after getting home. I -think- I found a way
to do it, but there are a couple of caveats:
1. Len's point about oversubscribing without warning would
definitely hold true - this would positively be a "user
beware" option
2. there could be no RM-provided allocation, hostfile, or
-host options specified. Basically, I would be adding the
"read rankfile" option to the end of the current allocation
determination procedure
I would still allow more procs than shown in the rankfile
(mapping the rest bynode on the nodes specified in the
rankfile - can't do byslot because I don't know how many slots
are on each node), which means the only change in behavior
would be the forced bynode mapping of unspecified procs.
So use of this option will entail some risks and a slight
difference in behavior, but would relieve you from the burden
of having to provide a hostfile. I'm not personally convinced
it is worth the risk and probable user complaints of "it
didn't work", but since we don't use this option, I don't have
a strong opinion on the matter.
Let's just avoid going back-and-forth over wanting it, or how
it should be implemented - let's get it all ironed out, and
then implement it once, like we finally did at the end with
the whole hostfile thing.
Let me know if you want me to do this - it obviously isn't at
the top of my priority list, but still could be done in the
next few weeks.
Ralph
On Jun 21, 2009, at 9:00 AM, Lenny Verkhovsky wrote:
Sorry for the delay in response, I totally agree with
Ralph that it's not as easy as it seems, 1. rankfile
mapper uses already allocated machines ( by scheduler or
hostfile ), by using rankfile as a hostfile we can run
into problem where trying to use unallocated nodes, what
can hang the run.
2. we can't define in rankfile number of slots on each
machine, which means oversubscribing can take place
without any warning.
3. I personally dont see any problem using hostfile, even
if it has redundant info, hostfile and rankfile belong to
different layers in the system and solve different
problems. The original hostfile ( if I recall correctly )
could bind rank to the node, but the syntax wasn't very
flexible and clear.
Lenny.
On Sun, Jun 21, 2009 at 5:15 PM, Ralph Castain
<r...@open-mpi.org <mailto:r...@open-mpi.org>
<mailto:r...@open-mpi.org <mailto:r...@open-mpi.org>>> wrote:
Let me suggest a two-step process, then:
1. let's change the error message as this is easily
done and thus
can be done now
2. I can look at how to eat the rankfile as a hostfile.
This may
not even be possible - the problem is that the entire
system is
predicated on certain ordering due to our framework
architecture.
So we get an allocation, and then do a mapping against that
allocation, filtering the allocation through hostfiles,
-host,
and other options.
By the time we reach the rankfile mapper, we have already
determined that we don't have an allocation and have to
abort. It
is the rankfile mapper itself that looks for the -rankfile
option, so the system can have no knowledge that
someone has
specified that option before that point - and thus,
even if I
could parse the rankfile, I don't know it was given!
What will take time is to figure out a way to either:
(a) allow us to run the mapper even though we don't
have any
nodes we know about, and allow the mapper to insert the
nodes
itself - without causing non-rankfile uses to break
(which could
be a major feat); or
(b) have the overall system check for the rankfile
option and
pass it as a hostfile as well, assuming that a hostfile
wasn't
also given, no RM-based allocation exists, etc. - which
breaks
our abstraction rules and also opens a possible can of
worms.
Either way, I also then have to teach the hostfile
parser how to
realize it is a rankfile format and convert the info in
it into
what we expected to receive from a hostfile - another
non-trivial
problem.
I'm willing to give it a try - just trying to make
clear why my
response was negative. It isn't as simple as it
sounds...which is
why Len and I didn't pursue it when this was originally
developed.
Ralph
On Sun, Jun 21, 2009 at 5:28 AM, Terry Dontje
<terry.don...@sun.com <mailto:terry.don...@sun.com>
<mailto:terry.don...@sun.com
<mailto:terry.don...@sun.com>>> wrote:
Being a part of these discussions I can understand your
reticence to reopen this discussion. However, I
think this
is a major usability issue with this feature which
actually
is fairly important in order to get things to run
performant.
Which IMO is important.
That being said I think there are one of two things
that
could be done to mitigate the issue.
1. To eliminate the element of surprise by
changing mpirun
to eat rankfile without the hostfile.
2. To change the error message to something
understandable
by the user such that they
know they might be missing the hostfile option.
Again I understand this topic is frustrating and
there are
some boundaries with the design that make these two
option
orthogonal to each other but I really believe we
need to make
the rankfile option something that is easily usable
by our users.
--td
Ralph Castain wrote:
Having gone around in circles on
hostfile-related issues
for over five years now, I honestly have little
motivation to re-open the entire discussion
again. It
doesn't seem to be that daunting a requirement
for those
who are using it, so I'm inclined to just leave
well
enough alone.
:-)
On Fri, Jun 19, 2009 at 2:21 PM, Eugene Loh
<eugene....@sun.com <mailto:eugene....@sun.com>
<mailto:eugene....@sun.com <mailto:eugene....@sun.com>>
<mailto:eugene....@sun.com
<mailto:eugene....@sun.com> <mailto:eugene....@sun.com
<mailto:eugene....@sun.com>>>>
wrote:
Ralph Castain wrote:
The two files have a slightly different
format
Agreed.
and completely different meaning.
Somewhat agreed. They're both related to
mapping
processes onto a
cluster.
The hostfile specifies how many slots
are on a
node. The rankfile
specifies a rank and what node/slot it
is to be
mapped onto.
Agreed.
Rankfiles can use relative node indexing
and refer
to nodes
received from a resource manager - i.e.,
without
any hostfile.
This is the main part I'm concerned about.
E.g.,
% cat rankfile
rank 0=node0 slot=0
rank 1=node1 slot=0
% mpirun -np 2 -rf rankfile ./a.out
--------------------------------------------------------------------------
Rankfile claimed host node1 that was not
allocated or
oversubscribed it's slots:
--------------------------------------------------------------------------
[node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
parameter in file
rmaps_rank_file.c at line 107
[node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
parameter in file
base/rmaps_base_map_job.c at line 86
[node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
parameter in file
base/plm_base_launch_support.c at line 86
[node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
parameter in file
plm_rsh_module.c at line 1016
% mpirun -np 2 -host node0,node1 -rf
rankfile ./a.out
0 on node0
1 on node1
done
It seems to me that the rankfile has sufficient
information to
express what I want it to do. But mpirun
won't accept
this. To
fix this, I have to, e.g.,
supply/maintain/specify
redundant
information in a hostfile or host list.
So the files are intentionally quite
different.
Trying to combine
them would be rather ugly.
Right. And my issue is that I'm forced to
use both
when I only
want rankfile functionality.
On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh
<eugene....@sun.com
<mailto:eugene....@sun.com> <mailto:eugene....@sun.com
<mailto:eugene....@sun.com>>
<mailto:eugene....@sun.com
<mailto:eugene....@sun.com>
<mailto:eugene....@sun.com
<mailto:eugene....@sun.com>>>> wrote:
In order to use "mpirun --rankfile",
I also
need to specify
hosts/hostlist. But that information is
redundant with what
I provide in the rankfile. So, from
a user's
point of view,
this strikes me as broken. Yes?
Should I
file a ticket, or
am I missing something here about this
functionality?
_______________________________________________
devel mailing list
de...@open-mpi.org
<mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org
<mailto:de...@open-mpi.org>>
<mailto:de...@open-mpi.org
<mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org
<mailto:de...@open-mpi.org>>>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
------------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
<mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
<mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
<mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
<mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
------------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
------------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel