I think that as long as there is a single home area per cluster the difference between the different approaches might seem irrelevant to most of the people.
My problem is twofold. First, I have a common home area across several different development clusters. Thus I have direct access through ssh to any machine. If I create a single large machinefile, it turns out that every mpirun will spawn a daemon on every single node, even if I only run a ping-pong test. Second, while I usually run my apps on the same set of resources I need on a regular base to switch my nodes for few tests. What I was hoping to achieve is a machinefile containing the "default" development cluster (aka. the cluster where I'm almost alone so my deamons have minimal chances to disturb other people experiences), and then use a machinefile to sporadicly change the cluster where I run for smaller tests. Unfortunately, this doesn't work due to the filtering behavior described in my original email. george. On Jul 28, 2012, at 19:24 , Ralph Castain wrote: > It's been awhile, but I vaguely remember the discussion. IIRC, the rationale > was that the default hostfile was equivalent to an RM allocation and should > be treated the same. So hostfile and -host become filters in that case. > > FWIW, I believe the discussion was split on that question. I added a "none" > option to the default hostfile MCA param so it would be ignored in the case > where (a) the sys admin has given a default hostfile, but (b) someone wants > to use hosts outside of it. > > MCA orte: parameter "orte_default_hostfile" (current value: > <none>, data source: default value) > Name of the default hostfile (relative or absolute > path, "none" to ignore environmental or default MCA param setting) > > That said, I can see a use-case argument for behaving somewhat differently. > We've even had cases where users have gotten an allocation from an RM, but > want to add hosts that are external to the cluster to the job. > > It would be rather trivial to modify the logic: > > 1. read the default hostfile or RM allocation for our baseline > > 2. remove any hosts on that list that are *not* in the given hostfile > > 3. add any hosts that are in the given hostfile, but weren't in the default > hostfile > > And subsequently do the same for -host. I think that would retain the spirit > of the discussion, but provide more flexibility and provide a tad more > "expected" behavior. > > I don't have an iron in this fire as I don't use hostfiles, so I'm happy to > implement whatever the community would like to see. > Ralph > > On Jul 27, 2012, at 6:30 PM, George Bosilca wrote: > >> I'm somewhat puzzled by the behavior of the -hostfile in Open MPI. Based on >> the FAQ it is supposed to provide a list of resources to be used by the >> launcher (in my case ssh) to start the processes. Make sense so far. >> >> However, if the configuration file contain a value for >> orte_default_hostfile, then the behavior of the hostfile option change >> drastically, and the option become a filter (the machines must be on the >> original list or a cryptic error message is displayed). >> >> Overall, we have a well defined [mostly] consistent behavior for parameters >> in Open MPI. We have an order of precedence of sources of MCA parameters, >> clearly defined which make understanding where a value comes >> straightforward. I'm absolutely certain there was a group discussion about >> this unique "eccentricity" regarding the hostfile option, but I fail to >> remember what was the reason we decided to go this way. Can I have a quick >> refresh please? >> >> Thanks, >> george. >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel