REVISION 2 (based on feedback in last 24 hours). Changes:
- NETWORK instead of NETWORK_TYPE - Shared memory and process loopback are not affected by this CLI - Change the OPAL API usage. I actually like points 1-8 below quite a bit. If implemented in ALL BTLs/MTLs/etc., it can solve the "how do I disable XYZ across all of Open MPI?" problem nicely. Point 9 -- what does QUALIFIER mean/how is it used? -- still needs work (no real updates since rev 1 of this proposal). I am thinking that QUALIFIER (somehow) can be used to figure out which OMPI code path to use for a given network (e.g., BTL vs. MTL, etc.). ----- mpirun --[enable|disable] NETWORK[:QUALIFIER][,NETWORK[:QUALIFIER]]* # Or "--[net|nonet]", or some other name if "enable|disable" is too general. # Suggestions welcome. 1. The intent of these CLI options is to easily enable/disable specific network types and/or specific interfaces. 2. The use of shared memory and process loopback is assumed (and is not affected by these CLI options -- the "expert" level must be used if specific control over shared memory / loopback is desired). 3. Both forms take a comma-delimited list of 1 or more items. 4. --enable would work similar to our "include" MCA params: OMPI will *only* use the network type(s) listed (but will still use shared memory and process loopback). 5. --disable would work similar to our "exclude" MCA params: OMPI will use all network types *except* those listed (but will still used shared memory and process loopback). 6. NETWORK values can generally be one of three things: - a human-recognizable name (e.g., "ib", "ethernet", ...etc.) - a Linux interface device name (e.g., "eth0", "usnic_0", "mlx4_0", optionally specifying a specific port if desired and relevant, such as "mlx4_0:1") - a network address (e.g., "10.20.0.0/16", which specifies a specific network interface+port) 7. NETWORK and QUALIFIER values are parsed (by orterun/etc.) and distributed to MPI processes. 8. MPI processes can query the NETWORK values during BTL/MTL/etc. initialization and selection. It may be sufficient to have a simple "did the user specify this NETWORK value?" (case insensitive) query function that just returns a boolean. For example, the TCP BTL could look like this (only showing "enable" logic for simplicity -- adding "disable" logic is an exercise left for the reader): ----- if (opal_network_value("eth") || opal_network_value("ethernet")) { want_all_ip_interfaces = true; } else { foreach IP_interface { // Search for strings like "eth0" or "10.10.0.0/16" if (opal_network_value(ip_interface_name) || opal_network_value(CIDR of ip_interface_name)) { push(@desired_interfaces, ip_interface_name); } } } foreach IP_interface { if (want_all_ip_interfaces || @desired_interfaces contains ip_interface) { make a module for that IP interface } } ----- The usnic BTL would likely be quite similar to the TCP BTL, but also look for strings like "usnic_0". The openib BTL could look like this: ----- if (opal_network_value("ib") || opal_network_value("infiniband")) { want_all_ib_interfaces = true; } else if (opal_network_value("roce") { want_all_roce_interfaces = true; } else if (opal_network_value("iwarp") { want_all_iwarp_interfaces = true; } else if (opal_network_value("eth") || opal_network_value("ethernet")) { want_all_roce_interfaces = true; want_all_iwarp_interfaces = true; } else { foreach verbs_interface { // Search for strings like "mlx4_0" or "10.50.0.0/16" for RoCE/iWARP/IB with IPoIB enabled. // Could also search for IB subnet IDs, if desired...? if (opal_network_value(verbs_interface_name) || opal_network_value(subnet ID of verbs_interface_name) || opal_network_value(IP CIDR of verbs_interface_name)) { push(@desired_interfaces, verbs_interface_name); } } } foreach verbs_interface { make_module = false; if (@desired_interfaces contains verbs_interface) { make_module = true; } else if (verbs_interface is IB && want_all_ib_interfaces) make_module = true; } else if (verbs_interface is RoCE && want_all_roce_interfaces) make_module = true; } else if (verbs_interface is iWARP && want_all_iwarp_interfaces) make_module = true; } if (make_module) { make a module for that verbs interface } } ----- I imagine that the MXM MTL, Yalla PML, and hcoll and FCA colls, could be similar, but slightly simpler since they (assumedly) don't care about iWARP interfaces. PSM / PSM2 / uGNI / Portals / etc. can all do similar things. The key here is that ALL BTLs, MTLs, OSC, and COLL modules -- anything that talks directly to the network -- will need to use this opal_network_value() API. 9. The ":QUALIFIER" value is optional for each NETWORK_TYPE specified, and can be used to disambiguate when a given network type can be reached multiple ways in OMPI. E.g., it can help choose between the openib BTL, the MXM MTL, and the Yalla PML. E.g.: mpirun --enable ib:btl mpirun --enable ib:mtl mpirun --enable ib:yalla That being said, I don't like these names (btl, mtl, yalla) because they mean nothing to non-OMPI experts. But I like the concept that a QUALIFIER can (somehow) help choose between the different OMPI code paths. Here's another example: mpirun --enable eth:tcp mpirun --enable eth:usnic These QUALIFIER values are a *little* better, but not much -- the user still has to know that they exist to know to choose one of them ("tcp" and "usnic"). But note that usNIC will someday have tag matching support, so it will be able to be used through the OFI MTL, too. Hence, "eth:usnic" won't be unique... ...thoughts? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/