It's been awhile, but I vaguely remember the discussion. IIRC, the rationale 
was that the default hostfile was equivalent to an RM allocation and should be 
treated the same. So hostfile and -host become filters in that case.

FWIW, I believe the discussion was split on that question. I added a "none" 
option to the default hostfile MCA param so it would be ignored in the case 
where (a) the sys admin has given a default hostfile, but (b) someone wants to 
use hosts outside of it.

                MCA orte: parameter "orte_default_hostfile" (current value: 
<none>, data source: default value)
                          Name of the default hostfile (relative or absolute 
path, "none" to ignore environmental or default MCA param setting)

That said, I can see a use-case argument for behaving somewhat differently. 
We've even had cases where users have gotten an allocation from an RM, but want 
to add hosts that are external to the cluster to the job.

It would be rather trivial to modify the logic:

1. read the default hostfile or RM allocation for our baseline

2. remove any hosts on that list that are *not* in the given hostfile

3. add any hosts that are in the given hostfile, but weren't in the default 
hostfile

And subsequently do the same for -host. I think that would retain the spirit of 
the discussion, but provide more flexibility and provide a tad more "expected" 
behavior.

I don't have an iron in this fire as I don't use hostfiles, so I'm happy to 
implement whatever the community would like to see.
Ralph

On Jul 27, 2012, at 6:30 PM, George Bosilca wrote:

> I'm somewhat puzzled by the behavior of the -hostfile in Open MPI. Based on 
> the FAQ it is supposed to provide a list of resources to be used by the 
> launcher (in my case ssh) to start the processes. Make sense so far.
> 
> However, if the configuration file contain a value for orte_default_hostfile, 
> then the behavior of the hostfile option change drastically, and the option 
> become a filter (the machines must be on the original list or a cryptic error 
> message is displayed).
> 
> Overall, we have a well defined [mostly] consistent behavior for parameters 
> in Open MPI. We have an order of precedence of sources of MCA parameters, 
> clearly defined which make understanding where a value comes straightforward. 
> I'm absolutely certain there was a group discussion about this unique 
> "eccentricity" regarding the hostfile option, but I fail to remember what was 
> the reason we decided to go this way. Can I have a quick refresh please?
> 
> Thanks,
> george.
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to