Yeah, I've been fighting with myself on whether providing a new interface that combines the two makes sense or just muddies the water more. I also been wondering if providing a generalized method/option to describe binding patterns would also mitigate this issue by removing the need for a rankfile at all.

--td

Mike Dubman wrote:
just an idea, maybe it is worse to provide brand new cmd line option to mpirun. This option will accept filename and support combined syntax for machinefile/hostfile (to define allocations) and rankfile (to define placement).

YAML syntax can be used in order to describe file primitives (http://www.yaml.org/start.html)

for example:


$ mpirun -clusterfile /path/to/clusterfile
$ cat clusterfile
hostX:
       slots       : int
       maxslots : int
       ranks      : rankid[@socket:core]


example of clusterfile
===============

hostX:
       slots       : 4
       maxslots : 4
       ranks      : 1,16,22

hostY:
      slots       : 8
      maxslots : 8
      ranks      : 1@0:*, 3@2-3,  4@0:1, 5


By doing so, we keep backwards compatability.
after reading clusterfile, code should perform *hostfile* and *rankfile* parts as today. what do you think?
Mike



On Mon, Jun 22, 2009 at 1:30 PM, Terry Dontje <terry.don...@sun.com <mailto:terry.don...@sun.com>> wrote:

    Let us think about this some more.  We'll try and reply later today.

    --td

    Ralph Castain wrote:

        Had a chance to think about how this might be done, and looked
        at it for awhile after getting home. I -think- I found a way
        to do it, but there are a couple of caveats:

        1. Len's point about oversubscribing without warning would
        definitely hold true - this would positively be a "user
        beware" option

        2. there could be no RM-provided allocation, hostfile, or
        -host options specified. Basically, I would be adding the
        "read rankfile" option to the end of the current allocation
        determination procedure

        I would still allow more procs than shown in the rankfile
        (mapping the rest bynode on the nodes specified in the
        rankfile - can't do byslot because I don't know how many slots
        are on each node), which means the only change in behavior
        would be the forced bynode mapping of unspecified procs.

        So use of this option will entail some risks and a slight
        difference in behavior, but would relieve you from the burden
        of having to provide a hostfile. I'm not personally convinced
        it is worth the risk and probable user complaints of "it
        didn't work", but since we don't use this option, I don't have
        a strong opinion on the matter.

        Let's just avoid going back-and-forth over wanting it, or how
        it should be implemented - let's get it all ironed out, and
        then implement it once, like we finally did at the end with
        the whole hostfile thing.

        Let me know if you want me to do this - it obviously isn't at
        the top of my priority list, but still could be done in the
        next few weeks.

        Ralph


        On Jun 21, 2009, at 9:00 AM, Lenny Verkhovsky wrote:

            Sorry for the delay in response, I totally agree with
            Ralph that it's not as easy as it seems, 1. rankfile
            mapper uses already allocated machines ( by scheduler or
            hostfile ), by using rankfile as a hostfile we can run
            into problem where trying to use unallocated nodes, what
            can hang the run.
            2. we can't define in rankfile number of slots on each
            machine, which means oversubscribing can take place
            without any warning.
            3. I personally dont see any problem using hostfile, even
            if it has redundant info, hostfile and rankfile belong to
            different layers in the system and solve different
            problems. The original hostfile ( if I recall correctly )
            could bind rank to the node, but the syntax wasn't very
            flexible and clear.
            Lenny.

            On Sun, Jun 21, 2009 at 5:15 PM, Ralph Castain
            <r...@open-mpi.org <mailto:r...@open-mpi.org>
            <mailto:r...@open-mpi.org <mailto:r...@open-mpi.org>>> wrote:

               Let me suggest a two-step process, then:

               1. let's change the error message as this is easily
            done and thus
               can be done now

               2. I can look at how to eat the rankfile as a hostfile.
            This may
               not even be possible - the problem is that the entire
            system is
               predicated on certain ordering due to our framework
            architecture.
               So we get an allocation, and then do a mapping against that
               allocation, filtering the allocation through hostfiles,
            -host,
               and other options.

               By the time we reach the rankfile mapper, we have already
               determined that we don't have an allocation and have to
            abort. It
               is the rankfile mapper itself that looks for the -rankfile
               option, so the system can have no knowledge that
            someone has
               specified that option before that point - and thus,
            even if I
               could parse the rankfile, I don't know it was given!

               What will take time is to figure out a way to either:

               (a) allow us to run the mapper even though we don't
            have any
               nodes we know about, and allow the mapper to insert the
            nodes
               itself - without causing non-rankfile uses to break
            (which could
               be a major feat); or

               (b) have the overall system check for the rankfile
            option and
               pass it as a hostfile as well, assuming that a hostfile
            wasn't
               also given, no RM-based allocation exists, etc. - which
            breaks
               our abstraction rules and also opens a possible can of
            worms.

               Either way, I also then have to teach the hostfile
            parser how to
               realize it is a rankfile format and convert the info in
            it into
               what we expected to receive from a hostfile - another
            non-trivial
               problem.

               I'm willing to give it a try - just trying to make
            clear why my
               response was negative. It isn't as simple as it
            sounds...which is
               why Len and I didn't pursue it when this was originally
            developed.

               Ralph


               On Sun, Jun 21, 2009 at 5:28 AM, Terry Dontje
               <terry.don...@sun.com <mailto:terry.don...@sun.com>
            <mailto:terry.don...@sun.com
            <mailto:terry.don...@sun.com>>> wrote:

                   Being a part of these discussions I can understand your
                   reticence to reopen this discussion.  However, I
            think this
                   is a major usability issue with this feature which
            actually
                   is fairly important in order to get things to run
            performant.
                   Which IMO is important.

                   That being said I think there are one of two things
            that
                   could be done to mitigate the issue.

                   1.  To eliminate the element of surprise by
            changing mpirun
                   to eat rankfile without the hostfile.
                   2.  To change the error message to something
            understandable
                   by the user such that they
                   know they might be missing the hostfile option.

                   Again I understand this topic is frustrating and
            there are
                   some boundaries with the design that make these two
            option
                   orthogonal to each other but I really believe we
            need to make
                   the rankfile option something that is easily usable
            by our users.


                   --td

                   Ralph Castain wrote:

                       Having gone around in circles on
            hostfile-related issues
                       for over five years now, I honestly have little
                       motivation to re-open the entire discussion
            again. It
                       doesn't seem to be that daunting a requirement
            for those
                       who are using it, so I'm inclined to just leave
            well
                       enough alone.

                       :-)


                       On Fri, Jun 19, 2009 at 2:21 PM, Eugene Loh
                       <eugene....@sun.com <mailto:eugene....@sun.com>
            <mailto:eugene....@sun.com <mailto:eugene....@sun.com>>
                       <mailto:eugene....@sun.com
            <mailto:eugene....@sun.com> <mailto:eugene....@sun.com
            <mailto:eugene....@sun.com>>>>

                       wrote:

                          Ralph Castain wrote:

                              The two files have a slightly different
            format

                          Agreed.

                              and completely different meaning.

                          Somewhat agreed.  They're both related to
            mapping
                       processes onto a
                          cluster.

                              The hostfile specifies how many slots
            are on a
                           node. The rankfile
                              specifies a rank and what node/slot it
            is to be
                           mapped onto.

                          Agreed.

                              Rankfiles can use relative node indexing
            and refer
                           to nodes
                              received from a resource manager - i.e.,
            without
                           any hostfile.

                          This is the main part I'm concerned about.
             E.g.,

                          % cat rankfile
                          rank 0=node0 slot=0
                          rank 1=node1 slot=0
                          % mpirun -np 2 -rf rankfile ./a.out
--------------------------------------------------------------------------
                          Rankfile claimed host node1 that was not
            allocated or
                          oversubscribed it's slots:

--------------------------------------------------------------------------
                          [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
                       parameter in file
                          rmaps_rank_file.c at line 107
                          [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
                       parameter in file
                          base/rmaps_base_map_job.c at line 86
                          [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
                       parameter in file
                          base/plm_base_launch_support.c at line 86
                          [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
                       parameter in file
                          plm_rsh_module.c at line 1016
                          % mpirun -np 2 -host node0,node1 -rf
            rankfile ./a.out
                          0 on node0
                          1 on node1
                          done

                          It seems to me that the rankfile has sufficient
                       information to
                          express what I want it to do.  But mpirun
            won't accept
                       this.  To
                          fix this, I have to, e.g.,
            supply/maintain/specify
                       redundant
                          information in a hostfile or host list.

                              So the files are intentionally quite
            different.
                           Trying to combine
                              them would be rather ugly.

                          Right.  And my issue is that I'm forced to
            use both
                       when I only
                          want rankfile functionality.

                              On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh
                           <eugene....@sun.com
            <mailto:eugene....@sun.com> <mailto:eugene....@sun.com
            <mailto:eugene....@sun.com>>
                              <mailto:eugene....@sun.com
            <mailto:eugene....@sun.com>
                           <mailto:eugene....@sun.com
            <mailto:eugene....@sun.com>>>> wrote:

                                  In order to use "mpirun --rankfile",
            I also
                           need to specify
                                  hosts/hostlist.  But that information is
                           redundant with what
                                  I provide in the rankfile.  So, from
            a user's
                           point of view,
                                  this strikes me as broken.  Yes?
             Should I
                           file a ticket, or
                                  am I missing something here about this
                           functionality?


                          _______________________________________________
                          devel mailing list
                          de...@open-mpi.org
            <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org
            <mailto:de...@open-mpi.org>>
                       <mailto:de...@open-mpi.org
            <mailto:de...@open-mpi.org> <mailto:de...@open-mpi.org
            <mailto:de...@open-mpi.org>>>


http://www.open-mpi.org/mailman/listinfo.cgi/devel


------------------------------------------------------------------------



                       _______________________________________________
                       devel mailing list
                       de...@open-mpi.org <mailto:de...@open-mpi.org>
            <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
                       http://www.open-mpi.org/mailman/listinfo.cgi/devel
                   _______________________________________________
                   devel mailing list
                   de...@open-mpi.org <mailto:de...@open-mpi.org>
            <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
                   http://www.open-mpi.org/mailman/listinfo.cgi/devel



               _______________________________________________
               devel mailing list
               de...@open-mpi.org <mailto:de...@open-mpi.org>
            <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
               http://www.open-mpi.org/mailman/listinfo.cgi/devel


            _______________________________________________
            devel mailing list
            de...@open-mpi.org <mailto:de...@open-mpi.org>
            <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
            http://www.open-mpi.org/mailman/listinfo.cgi/devel


        ------------------------------------------------------------------------

        _______________________________________________
        devel mailing list
        de...@open-mpi.org <mailto:de...@open-mpi.org>
        http://www.open-mpi.org/mailman/listinfo.cgi/devel

    _______________________________________________
    devel mailing list
    de...@open-mpi.org <mailto:de...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/devel


------------------------------------------------------------------------

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to