Don’t you also have the question of, for example, PSM via the mtl/psm versus 
PSM via the mtl/ofi path? So I think you need to split the entries in #2 as:

PSM/MTL
PSM/MTL/OFI

PSM2/MTL
PSM2/MTL/OFI

etc. Or we could remove the PSM/PSM2 MTL components and just drive those thru 
the OFI provider interface. Not sure how those groups view it...

I imagine others may have similar issues as OFI providers are added.

> On Oct 20, 2015, at 1:35 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
> I looked quickly over the quoted emails and didn't see something I had 
> hoped/expected to.
> 
> In addition to the "dimensions" of type, api and pml I think users may also 
> be concerned about the "port" dimension (or device if you prefer).
> So, it might be worth including that in the discussion of the 
> high-level-thing-for-end-users.
> 
> As an example, I might have two ethernet cards, one of which is a Cisco VNIC.
> I would want be able to control which BTL or MTL is used on those NICs 
> independently, including the option to disable use of one or the other.
> I do not want to learn distinct include/exclude MCA params for every BTL and 
> MTL to accomplish that.
> 
> -Paul
> 
> On Tue, Oct 20, 2015 at 12:42 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com 
> <mailto:jsquy...@cisco.com>> wrote:
> We talked about this on the call last week.
> 
> I'm guessing we'll talk about this at the Feb dev meeting, but we need to 
> think about this a bit before hand.  Here's a little more fuel for the fire: 
> let's at least specify the problem space a bit more precisely...
> 
> (this item is on the agenda for the Feb dev meeting, but we all need to think 
> about this a little before then; it's a complicated set of issues)
> 
> One (not-even-half-baked) idea that was raised on the call last week was the 
> idea of 3 levels of specifying networks:
> 
> 1. Automatic selection.  "mpirun a.out" -- OMPI does all the selection for 
> the user.
> 2. High-level abstraction.  "mpirun <SOME NICE EASY-TO-UNDERSTAND CLI 
> OPTIONS> a.out"
> 3. Low-level specification.  "mpirun --mca btl usnic,sm,self a.out"
> 
> #1 and #3 already exist today: #1 is for most users, #3 is for OMPI experts.
> 
> #2 is the new thing.  It's intended for those who have a clue about what they 
> want, but they aren't necessarily OMPI or networking experts.  The trick is 
> defining what <SOME NICE EASY-TO-UNDERSTAND CLI OPTIONS> is.
> 
> The selection space is complicated -- it has (at least?) three dimensions:
> 
> 1. First, we have network types:
> 
> a. Ethernet
> b. InfiniBand
> c. uGNI
> d. InfiniPath
> e. OmniScale
> f. Shared memory
> g. SCIF
> 
> 2. Second, we have network APIs:
> 
> a. TCP
> b. usNIC (via libfabric)
> c. Verbs
> d. MXM
> e. uGNI
> f. PSM
> g. PSM2
> h. POSIX shared memory segments
> i. xpmem
> j. knem
> k. Linux CMA
> l. SCIF
> 
> 3. Third, we have Open MPI networking layers:
> 
> a. PML OB1 (and associated BTLs)
> b. PML CM (and associated MTL)
> c. PML BFO
> d. PML crcpw
> e. PML v
> f. PML Yalla
> g. PML UCX (soon)
> 
> These three spaces can be combined in specific ways (E.g., Ethernet / TCP / 
> PML OB1 + BTLs).
> BTLs have the added complication that multiple can be used in a single job.
> Some network types can be accessed through multiple combinations.
> Obviously, not all combinations are sensible (e.g., uGNI / PSM2 / PML Yalla).
> 
> The Big Issues here are:
> 
> - the user generally only knows about the first dimension: network type.
> - the OMPI networking layer names are generally not meaningful unless you're 
> an OMPI expert.
> 
> So how do we present a "simple" / "higher-level abstraction" for the average 
> user?
> 
> 
> 
> > On Oct 12, 2015, at 11:47 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com 
> > <mailto:jsquy...@cisco.com>> wrote:
> >
> > Rolf: can you add this to the agenda?
> >
> > We're now adding multiple ways to get to the same underlying network 
> > transport, and it's getting confusing for users (I've fielded several 
> > off-list questions from users about this issue).
> >
> > - MXM: can be accessed via Yalla, the MXM MTL, (soon) UCX, and (soon) 
> > libfabric
> > - PSM: can be accessed via the PSM MTL and libfabric
> > - verbs: can be accessed via the openib BTL and libfabric
> > - PSM2: ditto
> > - uGNI: can be accessed via the uGNI BTL, portals(4?), and (soon) UCX
> > - shared memory: can be accessed via sm, vader, and (soon) UCX
> >
> > But you can also look at this from a different perspective:
> >
> > - IB: can be used via Yalla, MXM MTL, UCX, libfabric (multiple ways)
> > - RoCE: can be used via ^^some (or all? I'm not sure) of these
> > - Cray: can be used via the uGNI BTL, portals(4?), and (soon) UCX
> >
> > ...what's a user supposed to use?
> >
> > And more specifically, how can a user enable or disable a specific type of 
> > network?  Or API?
> >
> > A recent (off list) example I had was a user who was frustrated trying to 
> > figure out how to disable all forms of MXM (note: this is a larger issue 
> > than just MXM).
> >
> > Bottom line: underlying networks can be accessed through multiple 
> > upper-layer APIs, and it creates both a mapping problem for the MPI 
> > implementation, and a usability issue for users trying to be specific about 
> > which network(s) they want the MPI implementation to use.
> >
> > I don't have a solution (or even a proposal) here.  This is something we 
> > need to think / talk about.
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com <mailto:jsquy...@cisco.com>
> > For corporate legal information go to: 
> > http://www.cisco.com/web/about/doing_business/legal/cri/ 
> > <http://www.cisco.com/web/about/doing_business/legal/cri/>
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com <mailto:jsquy...@cisco.com>
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/ 
> <http://www.cisco.com/web/about/doing_business/legal/cri/>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/10/18207.php 
> <http://www.open-mpi.org/community/lists/devel/2015/10/18207.php>
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov 
> <mailto:phhargr...@lbl.gov>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/10/18208.php

Reply via email to