On Jun 13, 2007, at 1:48 PM, Gleb Natapov wrote:

3. Use a file to convey this information, because it's better suited
to what we're trying to do (vs. MCA parameters).

Seriously, why is a file a bad thing?  The file can list interfaces
by hostname.  For example, if you have a heterogeneous setup, what's
to say that having btl_tcp_bandwidth_eth0 is not the same across all
your hosts?  That is -- the MCA parameters you're providing are not
sufficient for a true heterogeneous environment, anyway.
I don't feel strongly one way or the other. The command line approach
was much easier to implement. Is it possible to have one parser for all
BTLs or each one will have to implement different one?a

Let's take a step back and see exactly what we *want*. Then we can talk about how to have an interface for it.

1. We want to be able to specify bandwidth/latency values for BTL modules (and possibly other kinds of modules).

2. For the common case, we want to be able to specify a single [set of] value[s] that apply uniformly across the MPI job. This already exists in MCA parameters today.

3. For another common case, we want to be able to specify a small set of values that apply uniformly to specific interfaces across the MPI job (e.g., specify different values for eth0 and eth1). This exists today in variable MCA parameters.

4. For another case (possibly uncommon?), we want to be able to specify different values for different interfaces on different hosts. This exists today by having different MCA parameter files on each host and pairing it with #3. It's not exactly convenient, but it works.

If we agree that these are the things that we want, then I think #3 is the contentious area (I don't like variable MCA params that don't show up in ompi_info), and #4 could certainly be made more convenient (note that I previously said #4 was not possible, but I thought about it more and realized that it *is*; it's just not convenient as, for example, a single file that lists all hosts and their individual settings that can be replicated across a cluster). Indeed #3 could be combined with a more-convenient #4 and solve all the problems.

If you can agree to that, then I propose a simple INI-style text file that aggregates MCA parameters based on hostname. The INI section names are hostnames, but we support simple, shell-like regular expressions (e.g., * and ?). Consider mca-params.ini:

[head_node]
btl_tcp_if_include = eth1

[compute_nodes01*]
btl_tcp_if_include = eth0,ib0
btl_tcp_bandwidth = eth0=1000,ib0=2000

[compute_nodes02*]
btl_tcp_if_include = eth0,myri0
btl_tcp_bandwidth = eth0=1000,mryi0=2000

More specifically, I'm proposing two things:

1. The MCA system itself accept this ini-style file that keys off hostnames so that this works across all of Open MPI.

2. The bandwidth/latency MCA params accept values in two forms:
   - a single integer
   - comma-delimited list of <interface>=<value> pairs

Thoughts?

BTW ompi_info will not parse this file too, so it will not be able to
present correct bandwidth/latency value just like command line solution.
For heterogeneous config file is the only option of cause.

True. But I think it's a reasonable expectation that ompi_info should show all user-available MCA parameters. It doesn't claim to show data files (like the HCA params file).

--
Jeff Squyres
Cisco Systems

Reply via email to