With all due respect, I think this still dodges the key question. Are we now 
saying that every user will be *required* to provide this info? If not, then 
what is the default?

Let’s face it: the default is what 90+% of the world is going to use. This all 
seems rather complex to expect the average user to figure out.


> On Oct 21, 2015, at 8:09 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> 
> REVISION 2 (based on feedback in last 24 hours).
> 
> Changes:
> 
> - NETWORK instead of NETWORK_TYPE
> - Shared memory and process loopback are not affected by this CLI
> - Change the OPAL API usage.
> 
> I actually like points 1-8 below quite a bit.  If implemented in ALL 
> BTLs/MTLs/etc., it can solve the "how do I disable XYZ across all of Open 
> MPI?" problem nicely.
> 
> Point 9 -- what does QUALIFIER mean/how is it used? -- still needs work (no 
> real updates since rev 1 of this proposal).  I am thinking that QUALIFIER 
> (somehow) can be used to figure out which OMPI code path to use for a given 
> network (e.g., BTL vs. MTL, etc.).
> 
> -----
> 
>  mpirun --[enable|disable] NETWORK[:QUALIFIER][,NETWORK[:QUALIFIER]]*
>  # Or "--[net|nonet]", or some other name if "enable|disable" is too general.
>  # Suggestions welcome.
> 
> 1. The intent of these CLI options is to easily enable/disable specific 
> network types and/or specific interfaces.
> 
> 2. The use of shared memory and process loopback is assumed (and is not 
> affected by these CLI options -- the "expert" level must be used if specific 
> control over shared memory / loopback is desired).
> 
> 3. Both forms take a comma-delimited list of 1 or more items.
> 
> 4. --enable would work similar to our "include" MCA params: OMPI will *only* 
> use the network type(s) listed (but will still use shared memory and process 
> loopback).
> 
> 5. --disable would work similar to our "exclude" MCA params: OMPI will use 
> all network types *except* those listed (but will still used shared memory 
> and process loopback).
> 
> 6. NETWORK values can generally be one of three things:
> 
>   - a human-recognizable name (e.g., "ib", "ethernet", ...etc.)
>   - a Linux interface device name (e.g., "eth0", "usnic_0", "mlx4_0", 
> optionally specifying a specific port if desired and relevant, such as 
> "mlx4_0:1")
>   - a network address (e.g., "10.20.0.0/16", which specifies a specific 
> network interface+port)
> 
> 7. NETWORK and QUALIFIER values are parsed (by orterun/etc.) and distributed 
> to MPI processes.
> 
> 8. MPI processes can query the NETWORK values during BTL/MTL/etc. 
> initialization and selection.
> 
> It may be sufficient to have a simple "did the user specify this NETWORK 
> value?" (case insensitive) query function that just returns a boolean.
> 
> For example, the TCP BTL could look like this (only showing "enable" logic 
> for simplicity -- adding "disable" logic is an exercise left for the reader):
> 
> -----
>  if (opal_network_value("eth") || opal_network_value("ethernet")) {
>      want_all_ip_interfaces = true;
>  } else {
>      foreach IP_interface {
>          // Search for strings like "eth0" or "10.10.0.0/16"
>          if (opal_network_value(ip_interface_name) ||
>              opal_network_value(CIDR of ip_interface_name)) {
>              push(@desired_interfaces, ip_interface_name);
>          }
>      }
>  }
> 
>  foreach IP_interface {
>      if (want_all_ip_interfaces || @desired_interfaces contains ip_interface) 
> {
>          make a module for that IP interface
>      }
>  }
> -----
> 
> The usnic BTL would likely be quite similar to the TCP BTL, but also look for 
> strings like "usnic_0".
> 
> The openib BTL could look like this:
> 
> -----
>  if (opal_network_value("ib") || opal_network_value("infiniband")) {
>      want_all_ib_interfaces = true;
>  } else if (opal_network_value("roce") {
>      want_all_roce_interfaces = true;
>  } else if (opal_network_value("iwarp") {
>      want_all_iwarp_interfaces = true;
>  } else if (opal_network_value("eth") || opal_network_value("ethernet")) {
>      want_all_roce_interfaces = true;
>      want_all_iwarp_interfaces = true;
>  } else {
>      foreach verbs_interface {
>          // Search for strings like "mlx4_0" or "10.50.0.0/16" for 
> RoCE/iWARP/IB with IPoIB enabled.
>          // Could also search for IB subnet IDs, if desired...?
>          if (opal_network_value(verbs_interface_name) ||
>              opal_network_value(subnet ID of verbs_interface_name) ||
>              opal_network_value(IP CIDR of verbs_interface_name)) {
>              push(@desired_interfaces, verbs_interface_name);
>          }
>      }
>  }
> 
>  foreach verbs_interface {
>      make_module = false;
>      if (@desired_interfaces contains verbs_interface) {
>          make_module = true;
>      } else if (verbs_interface is IB && want_all_ib_interfaces)
>          make_module = true;
>      } else if (verbs_interface is RoCE && want_all_roce_interfaces)
>          make_module = true;
>      } else if (verbs_interface is iWARP && want_all_iwarp_interfaces)
>          make_module = true;
>      }
>      if (make_module) {
>          make a module for that verbs interface
>      }
>  }
> -----
> 
> I imagine that the MXM MTL, Yalla PML, and hcoll and FCA colls, could be 
> similar, but slightly simpler since they (assumedly) don't care about iWARP 
> interfaces.
> 
> PSM / PSM2 / uGNI / Portals / etc. can all do similar things.
> 
> The key here is that ALL BTLs, MTLs, OSC, and COLL modules -- anything that 
> talks directly to the network -- will need to use this opal_network_value() 
> API.
> 
> 9. The ":QUALIFIER" value is optional for each NETWORK_TYPE specified, and 
> can be used to disambiguate when a given network type can be reached multiple 
> ways in OMPI.  E.g., it can help choose between the openib BTL, the MXM MTL, 
> and the Yalla PML.  E.g.:
> 
>  mpirun --enable ib:btl
>  mpirun --enable ib:mtl
>  mpirun --enable ib:yalla
> 
> That being said, I don't like these names (btl, mtl, yalla) because they mean 
> nothing to non-OMPI experts.  But I like the concept that a QUALIFIER can 
> (somehow) help choose between the different OMPI code paths.
> 
> Here's another example:
> 
>  mpirun --enable eth:tcp
>  mpirun --enable eth:usnic
> 
> These QUALIFIER values are a *little* better, but not much -- the user still 
> has to know that they exist to know to choose one of them ("tcp" and 
> "usnic").  But note that usNIC will someday have tag matching support, so it 
> will be able to be used through the OFI MTL, too.  Hence, "eth:usnic" won't 
> be unique...
> 
> ...thoughts?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/10/18232.php

Reply via email to