Don’t you also have the question of, for example, PSM via the mtl/psm versus PSM via the mtl/ofi path? So I think you need to split the entries in #2 as:
PSM/MTL PSM/MTL/OFI PSM2/MTL PSM2/MTL/OFI etc. Or we could remove the PSM/PSM2 MTL components and just drive those thru the OFI provider interface. Not sure how those groups view it... I imagine others may have similar issues as OFI providers are added. > On Oct 20, 2015, at 1:35 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > I looked quickly over the quoted emails and didn't see something I had > hoped/expected to. > > In addition to the "dimensions" of type, api and pml I think users may also > be concerned about the "port" dimension (or device if you prefer). > So, it might be worth including that in the discussion of the > high-level-thing-for-end-users. > > As an example, I might have two ethernet cards, one of which is a Cisco VNIC. > I would want be able to control which BTL or MTL is used on those NICs > independently, including the option to disable use of one or the other. > I do not want to learn distinct include/exclude MCA params for every BTL and > MTL to accomplish that. > > -Paul > > On Tue, Oct 20, 2015 at 12:42 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > <mailto:jsquy...@cisco.com>> wrote: > We talked about this on the call last week. > > I'm guessing we'll talk about this at the Feb dev meeting, but we need to > think about this a bit before hand. Here's a little more fuel for the fire: > let's at least specify the problem space a bit more precisely... > > (this item is on the agenda for the Feb dev meeting, but we all need to think > about this a little before then; it's a complicated set of issues) > > One (not-even-half-baked) idea that was raised on the call last week was the > idea of 3 levels of specifying networks: > > 1. Automatic selection. "mpirun a.out" -- OMPI does all the selection for > the user. > 2. High-level abstraction. "mpirun <SOME NICE EASY-TO-UNDERSTAND CLI > OPTIONS> a.out" > 3. Low-level specification. "mpirun --mca btl usnic,sm,self a.out" > > #1 and #3 already exist today: #1 is for most users, #3 is for OMPI experts. > > #2 is the new thing. It's intended for those who have a clue about what they > want, but they aren't necessarily OMPI or networking experts. The trick is > defining what <SOME NICE EASY-TO-UNDERSTAND CLI OPTIONS> is. > > The selection space is complicated -- it has (at least?) three dimensions: > > 1. First, we have network types: > > a. Ethernet > b. InfiniBand > c. uGNI > d. InfiniPath > e. OmniScale > f. Shared memory > g. SCIF > > 2. Second, we have network APIs: > > a. TCP > b. usNIC (via libfabric) > c. Verbs > d. MXM > e. uGNI > f. PSM > g. PSM2 > h. POSIX shared memory segments > i. xpmem > j. knem > k. Linux CMA > l. SCIF > > 3. Third, we have Open MPI networking layers: > > a. PML OB1 (and associated BTLs) > b. PML CM (and associated MTL) > c. PML BFO > d. PML crcpw > e. PML v > f. PML Yalla > g. PML UCX (soon) > > These three spaces can be combined in specific ways (E.g., Ethernet / TCP / > PML OB1 + BTLs). > BTLs have the added complication that multiple can be used in a single job. > Some network types can be accessed through multiple combinations. > Obviously, not all combinations are sensible (e.g., uGNI / PSM2 / PML Yalla). > > The Big Issues here are: > > - the user generally only knows about the first dimension: network type. > - the OMPI networking layer names are generally not meaningful unless you're > an OMPI expert. > > So how do we present a "simple" / "higher-level abstraction" for the average > user? > > > > > On Oct 12, 2015, at 11:47 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > > <mailto:jsquy...@cisco.com>> wrote: > > > > Rolf: can you add this to the agenda? > > > > We're now adding multiple ways to get to the same underlying network > > transport, and it's getting confusing for users (I've fielded several > > off-list questions from users about this issue). > > > > - MXM: can be accessed via Yalla, the MXM MTL, (soon) UCX, and (soon) > > libfabric > > - PSM: can be accessed via the PSM MTL and libfabric > > - verbs: can be accessed via the openib BTL and libfabric > > - PSM2: ditto > > - uGNI: can be accessed via the uGNI BTL, portals(4?), and (soon) UCX > > - shared memory: can be accessed via sm, vader, and (soon) UCX > > > > But you can also look at this from a different perspective: > > > > - IB: can be used via Yalla, MXM MTL, UCX, libfabric (multiple ways) > > - RoCE: can be used via ^^some (or all? I'm not sure) of these > > - Cray: can be used via the uGNI BTL, portals(4?), and (soon) UCX > > > > ...what's a user supposed to use? > > > > And more specifically, how can a user enable or disable a specific type of > > network? Or API? > > > > A recent (off list) example I had was a user who was frustrated trying to > > figure out how to disable all forms of MXM (note: this is a larger issue > > than just MXM). > > > > Bottom line: underlying networks can be accessed through multiple > > upper-layer APIs, and it creates both a mapping problem for the MPI > > implementation, and a usability issue for users trying to be specific about > > which network(s) they want the MPI implementation to use. > > > > I don't have a solution (or even a proposal) here. This is something we > > need to think / talk about. > > > > -- > > Jeff Squyres > > jsquy...@cisco.com <mailto:jsquy...@cisco.com> > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > <http://www.cisco.com/web/about/doing_business/legal/cri/> > > > -- > Jeff Squyres > jsquy...@cisco.com <mailto:jsquy...@cisco.com> > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > <http://www.cisco.com/web/about/doing_business/legal/cri/> > > _______________________________________________ > devel mailing list > de...@open-mpi.org <mailto:de...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > <http://www.open-mpi.org/mailman/listinfo.cgi/devel> > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18207.php > <http://www.open-mpi.org/community/lists/devel/2015/10/18207.php> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > <mailto:phhargr...@lbl.gov> > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18208.php