I think that as long as there is a single home area per cluster the difference 
between the different approaches might seem irrelevant to most of the people.

My problem is twofold. First, I have a common home area across several 
different development clusters. Thus I have direct access through ssh to any 
machine. If I create a single large machinefile, it turns out that every mpirun 
will spawn a daemon on every single node, even if I only run a ping-pong test. 
Second, while I usually run my apps on the same set of resources I need on a 
regular base to switch my nodes for few tests.

What I was hoping to achieve is a machinefile containing the "default" 
development cluster (aka. the cluster where I'm almost alone so my deamons have 
minimal chances to disturb other people experiences), and then use a 
machinefile to sporadicly change the cluster where I run for smaller tests. 
Unfortunately, this doesn't work due to the filtering behavior described in my 
original email.

  george.


On Jul 28, 2012, at 19:24 , Ralph Castain wrote:

> It's been awhile, but I vaguely remember the discussion. IIRC, the rationale 
> was that the default hostfile was equivalent to an RM allocation and should 
> be treated the same. So hostfile and -host become filters in that case.
> 
> FWIW, I believe the discussion was split on that question. I added a "none" 
> option to the default hostfile MCA param so it would be ignored in the case 
> where (a) the sys admin has given a default hostfile, but (b) someone wants 
> to use hosts outside of it.
> 
>                MCA orte: parameter "orte_default_hostfile" (current value: 
> <none>, data source: default value)
>                          Name of the default hostfile (relative or absolute 
> path, "none" to ignore environmental or default MCA param setting)
> 
> That said, I can see a use-case argument for behaving somewhat differently. 
> We've even had cases where users have gotten an allocation from an RM, but 
> want to add hosts that are external to the cluster to the job.
> 
> It would be rather trivial to modify the logic:
> 
> 1. read the default hostfile or RM allocation for our baseline
> 
> 2. remove any hosts on that list that are *not* in the given hostfile
> 
> 3. add any hosts that are in the given hostfile, but weren't in the default 
> hostfile
> 
> And subsequently do the same for -host. I think that would retain the spirit 
> of the discussion, but provide more flexibility and provide a tad more 
> "expected" behavior.
> 
> I don't have an iron in this fire as I don't use hostfiles, so I'm happy to 
> implement whatever the community would like to see.
> Ralph
> 
> On Jul 27, 2012, at 6:30 PM, George Bosilca wrote:
> 
>> I'm somewhat puzzled by the behavior of the -hostfile in Open MPI. Based on 
>> the FAQ it is supposed to provide a list of resources to be used by the 
>> launcher (in my case ssh) to start the processes. Make sense so far.
>> 
>> However, if the configuration file contain a value for 
>> orte_default_hostfile, then the behavior of the hostfile option change 
>> drastically, and the option become a filter (the machines must be on the 
>> original list or a cryptic error message is displayed).
>> 
>> Overall, we have a well defined [mostly] consistent behavior for parameters 
>> in Open MPI. We have an order of precedence of sources of MCA parameters, 
>> clearly defined which make understanding where a value comes 
>> straightforward. I'm absolutely certain there was a group discussion about 
>> this unique "eccentricity" regarding the hostfile option, but I fail to 
>> remember what was the reason we decided to go this way. Can I have a quick 
>> refresh please?
>> 
>> Thanks,
>> george.
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to