Let us think about this some more.  We'll try and reply later today.

--td

Ralph Castain wrote:
Had a chance to think about how this might be done, and looked at it for awhile after getting home. I -think- I found a way to do it, but there are a couple of caveats:

1. Len's point about oversubscribing without warning would definitely hold true - this would positively be a "user beware" option

2. there could be no RM-provided allocation, hostfile, or -host options specified. Basically, I would be adding the "read rankfile" option to the end of the current allocation determination procedure

I would still allow more procs than shown in the rankfile (mapping the rest bynode on the nodes specified in the rankfile - can't do byslot because I don't know how many slots are on each node), which means the only change in behavior would be the forced bynode mapping of unspecified procs.

So use of this option will entail some risks and a slight difference in behavior, but would relieve you from the burden of having to provide a hostfile. I'm not personally convinced it is worth the risk and probable user complaints of "it didn't work", but since we don't use this option, I don't have a strong opinion on the matter.

Let's just avoid going back-and-forth over wanting it, or how it should be implemented - let's get it all ironed out, and then implement it once, like we finally did at the end with the whole hostfile thing.

Let me know if you want me to do this - it obviously isn't at the top of my priority list, but still could be done in the next few weeks.

Ralph


On Jun 21, 2009, at 9:00 AM, Lenny Verkhovsky wrote:

Sorry for the delay in response, I totally agree with Ralph that it's not as easy as it seems, 1. rankfile mapper uses already allocated machines ( by scheduler or hostfile ), by using rankfile as a hostfile we can run into problem where trying to use unallocated nodes, what can hang the run. 2. we can't define in rankfile number of slots on each machine, which means oversubscribing can take place without any warning. 3. I personally dont see any problem using hostfile, even if it has redundant info, hostfile and rankfile belong to different layers in the system and solve different problems. The original hostfile ( if I recall correctly ) could bind rank to the node, but the syntax wasn't very flexible and clear.
Lenny.

On Sun, Jun 21, 2009 at 5:15 PM, Ralph Castain <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:

    Let me suggest a two-step process, then:

    1. let's change the error message as this is easily done and thus
    can be done now

    2. I can look at how to eat the rankfile as a hostfile. This may
    not even be possible - the problem is that the entire system is
    predicated on certain ordering due to our framework architecture.
    So we get an allocation, and then do a mapping against that
    allocation, filtering the allocation through hostfiles, -host,
    and other options.

    By the time we reach the rankfile mapper, we have already
    determined that we don't have an allocation and have to abort. It
    is the rankfile mapper itself that looks for the -rankfile
    option, so the system can have no knowledge that someone has
    specified that option before that point - and thus, even if I
    could parse the rankfile, I don't know it was given!

    What will take time is to figure out a way to either:

    (a) allow us to run the mapper even though we don't have any
    nodes we know about, and allow the mapper to insert the nodes
    itself - without causing non-rankfile uses to break (which could
    be a major feat); or

    (b) have the overall system check for the rankfile option and
    pass it as a hostfile as well, assuming that a hostfile wasn't
    also given, no RM-based allocation exists, etc. - which breaks
    our abstraction rules and also opens a possible can of worms.

    Either way, I also then have to teach the hostfile parser how to
    realize it is a rankfile format and convert the info in it into
    what we expected to receive from a hostfile - another non-trivial
    problem.

    I'm willing to give it a try - just trying to make clear why my
    response was negative. It isn't as simple as it sounds...which is
    why Len and I didn't pursue it when this was originally developed.

    Ralph


    On Sun, Jun 21, 2009 at 5:28 AM, Terry Dontje
    <terry.don...@sun.com <mailto:terry.don...@sun.com>> wrote:

        Being a part of these discussions I can understand your
        reticence to reopen this discussion.  However, I think this
        is a major usability issue with this feature which actually
        is fairly important in order to get things to run performant.
        Which IMO is important.

        That being said I think there are one of two things that
        could be done to mitigate the issue.

        1.  To eliminate the element of surprise by changing mpirun
        to eat rankfile without the hostfile.
        2.  To change the error message to something understandable
        by the user such that they
        know they might be missing the hostfile option.

        Again I understand this topic is frustrating and there are
        some boundaries with the design that make these two option
        orthogonal to each other but I really believe we need to make
        the rankfile option something that is easily usable by our users.


        --td

        Ralph Castain wrote:

            Having gone around in circles on hostfile-related issues
            for over five years now, I honestly have little
            motivation to re-open the entire discussion again. It
            doesn't seem to be that daunting a requirement for those
            who are using it, so I'm inclined to just leave well
            enough alone.

            :-)


            On Fri, Jun 19, 2009 at 2:21 PM, Eugene Loh
            <eugene....@sun.com <mailto:eugene....@sun.com>
            <mailto:eugene....@sun.com <mailto:eugene....@sun.com>>>
            wrote:

               Ralph Castain wrote:

                   The two files have a slightly different format

               Agreed.

                   and completely different meaning.

               Somewhat agreed.  They're both related to mapping
            processes onto a
               cluster.

                   The hostfile specifies how many slots are on a
                node. The rankfile
                   specifies a rank and what node/slot it is to be
                mapped onto.

               Agreed.

                   Rankfiles can use relative node indexing and refer
                to nodes
                   received from a resource manager - i.e., without
                any hostfile.

               This is the main part I'm concerned about.  E.g.,

               % cat rankfile
               rank 0=node0 slot=0
               rank 1=node1 slot=0
               % mpirun -np 2 -rf rankfile ./a.out
--------------------------------------------------------------------------
               Rankfile claimed host node1 that was not allocated or
               oversubscribed it's slots:

--------------------------------------------------------------------------
               [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
            parameter in file
               rmaps_rank_file.c at line 107
               [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
            parameter in file
               base/rmaps_base_map_job.c at line 86
               [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
            parameter in file
               base/plm_base_launch_support.c at line 86
               [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
            parameter in file
               plm_rsh_module.c at line 1016
               % mpirun -np 2 -host node0,node1 -rf rankfile ./a.out
               0 on node0
               1 on node1
               done

               It seems to me that the rankfile has sufficient
            information to
               express what I want it to do.  But mpirun won't accept
            this.  To
               fix this, I have to, e.g., supply/maintain/specify
            redundant
               information in a hostfile or host list.

                   So the files are intentionally quite different.
                Trying to combine
                   them would be rather ugly.

               Right.  And my issue is that I'm forced to use both
            when I only
               want rankfile functionality.

                   On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh
                <eugene....@sun.com <mailto:eugene....@sun.com>
                   <mailto:eugene....@sun.com
                <mailto:eugene....@sun.com>>> wrote:

                       In order to use "mpirun --rankfile", I also
                need to specify
                       hosts/hostlist.  But that information is
                redundant with what
                       I provide in the rankfile.  So, from a user's
                point of view,
                       this strikes me as broken.  Yes?  Should I
                file a ticket, or
                       am I missing something here about this
                functionality?


               _______________________________________________
               devel mailing list
               de...@open-mpi.org <mailto:de...@open-mpi.org>
            <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>

               http://www.open-mpi.org/mailman/listinfo.cgi/devel


            
------------------------------------------------------------------------



            _______________________________________________
            devel mailing list
            de...@open-mpi.org <mailto:de...@open-mpi.org>
            http://www.open-mpi.org/mailman/listinfo.cgi/devel

        _______________________________________________
        devel mailing list
        de...@open-mpi.org <mailto:de...@open-mpi.org>
        http://www.open-mpi.org/mailman/listinfo.cgi/devel



    _______________________________________________
    devel mailing list
    de...@open-mpi.org <mailto:de...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel

------------------------------------------------------------------------

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to