On Tue, Oct 20, 2015 at 1:47 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote:
> On Oct 20, 2015, at 4:35 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > > > As an example, I might have two ethernet cards, one of which is a Cisco > VNIC. > > I would want be able to control which BTL or MTL is used on those NICs > independently, including the option to disable use of one or the other. > > I do not want to learn distinct include/exclude MCA params for every BTL > and MTL to accomplish that. > > Hmm. > > I think heterogeneous multirail is still pretty uncommon. It might still > be ok to force users (or better yet, their admins -- via the global > mca-params.conf file) to use level 3 to precisely specify which network / > OMPI API to use (e.g., BTL, MTL, ...etc.). > > I think a reasonable fraction of IB-connected clusters also have an Ethernet network plus have IPoIB enabled (thus two IP networks). So, I don't agree that heterogenous multirail is "pretty uncommon". Regardless, lets ignore heterogenous multirail for a moment and consider a related problem in a homogenous case. I have multiple ports of the same type, lets say a dual-port Mellanox HCA, and just want to disable one of them (reserving it for Luster perhaps). If OMPI is hiding from me the details of the API selection, how do I enable/disable specific ports? Right now I believe that I need two distinct MCA params to instruct ibv and mxm both to exclude a given IB port. I assume I will need two more params to tell ofi and ucx not to use the port either, right? Now lets assume I've got the portals4 reference implementation (over verbs) installed too. Now that makes no less than 5 MCA params I might need to pass to tell 5 different components to keep their hands off the reserved port. If I didn't know the HCA model/vendor, I might need a sixth and seventh MCA param to tell psm and psm2 not to use the port. However, if the port name is something like "mlx4_0.0" or "ipath0.0" then we can at least know if mxm or psm* are even possible. I don't suggest that OMPI should magically discover "aliases" for a port but if I have IPoIB the problem grows: I need yet more MCA params to tell tcp, ofi (and maybe ucx?) not to use the corresponding ibN interface. And don't forget about oob_tcp_if_exclude. To ensure that Jeff doesn't dodge the issue, lets assume this desire to disable one port is a *transient* need/desire by an end-user. In other words, I don't accept "admin should place all 11 MCA params in the global config file" as a valid solution. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900