Hello Yohann, Actually, I think that comment in the code is old. Looks like ompi_mtl_base_select goes for trying to open all mtl components, which has worked fine until recently since the available commercially supported mtls (mxm and psm) are mutually orthogonal, and the portals4 mtl shouldn't interfere with either the mxm or psm.
What probably should be done is to rely on the MCA's priority scheme. You can see an example of how this works in the pmix_base_select.c and the various pmix s1/s2/cray component files, e.g. the pmix_s2_component_query function in pmix_s2_component.c. LANL would be interested in working with you on this if you need help. We have both intel/infinipath and intel/mlnx systems, and in the case of the former, the head/io nodes typically have mlnx hca's as well since these hca's are typically better for interfacing to lustre. So we'd have non-trivial build environments/runtime environments which would be better at testing if something we broke something. Howard 2015-01-09 17:36 GMT-07:00 Burette, Yohann <yohann.bure...@intel.com>: > Hi, > > For those of you who don't know me, my name is Yohann Burette, I work for > Intel and I contributed the OFI MTL. > > AFAIK, the PSM MTL should have the priority over the OFI MTL. > > Please excuse my ignorance but is there a way to express this priority in > the MTLs? Here is what is in ompi/mca/mtl/base/mtl_base_frame.c: > > /* > * Function for selecting one component from all those that are > * available. > * > * For now, we take the first component that says it can run. Might > * need to reexamine this at a later time. > */ > int > ompi_mtl_base_select(bool enable_progress_threads, > bool enable_mpi_threads) > > Am I missing anything? > > Thanks in advance, > Yohann > > -----Original Message----- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres > (jsquyres) > Sent: Friday, January 09, 2015 1:27 PM > To: Open MPI Developers List > Subject: Re: [OMPI devel] Changed behaviour with PSM on master > > +1 -- someone should file a bug. > > I think Intel needs to decide how they want to handle this (e.g., whether > the PSM MTL or OFI MTL should be the default, and how the other can detect > if it's not the default and therefore it's safe to call psm_init... or > something like that). > > > On Jan 9, 2015, at 4:10 PM, Howard Pritchard <hpprit...@gmail.com> wrote: > > > HI Adrian, > > > > Please open an issue. We don't want users having to explicitly > > specify the mtl to use just to get a job to run on a intel/infinipath > system. > > > > Howard > > > > 2015-01-09 13:04 GMT-07:00 Adrian Reber <adr...@lisas.de>: > > Should I still open a ticket? Will these be changed or do I always > > have to provide '--mca mtl psm' in the future? > > > > On Fri, Jan 09, 2015 at 12:27:59PM -0700, Howard Pritchard wrote: > > > HI Adrian, Andrew, > > > > > > Sorry try again, both the libfabric psm provider and the open mpi > > > psm mtl are trying to use psm_init. > > > > > > So, to avoid this problem, add > > > > > > --mca mtl psm > > > > > > to your mpirun command line. > > > > > > Sorry for the confusion. > > > > > > Howard > > > > > > > > > 2015-01-09 7:52 GMT-07:00 Friedley, Andrew <andrew.fried...@intel.com > >: > > > > > > > No this is not expected behavior. > > > > > > > > The PSM MTL code has not changed in 2 months, when I fixed that > > > > unused variable warning for you. That suggests something above > > > > the PSM MTL broke things. I see no reason your older software > > > > install should suddenly stopping working if all you are updating > > > > is OMPI master -- at least with respect to PSM anyway. > > > > > > > > The error message is right, it's not possible to open more than > > > > one context per process. This hasn't changed. It does indicate > > > > that maybe something is causing the MTL to be opened twice in each > process? > > > > > > > > Andrew > > > > > > > > > -----Original Message----- > > > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of > > > > > Adrian Reber > > > > > Sent: Friday, January 9, 2015 4:13 AM > > > > > To: de...@open-mpi.org > > > > > Subject: [OMPI devel] Changed behaviour with PSM on master > > > > > > > > > > Running the mpi_test_suite on master used to work with no > > > > > problems. At some point in time it stopped working however and > > > > > now I get only error messages from PSM: > > > > > > > > > > """ > > > > > n050301:3.0.In PSM version 1.14, it is not possible to open more > > > > > than > > > > one > > > > > context per process > > > > > > > > > > [n050301:26526] Open MPI detected an unexpected PSM error in > > > > > opening an > > > > > endpoint: In PSM version 1.14, it is not possible to open more > > > > > than one context per process """ > > > > > > > > > > I know that I do not have the newest version of the PSM library > > > > > and that > > > > I > > > > > need to update the library but as this requires many software > > > > > packages > > > > to be > > > > > re-compiled we are trying to avoid it on our CentOS6 based system. > > > > > > > > > > My main question (probably for Andrew) is if this is an expected > > > > behaviour > > > > > on master. It works on 1.8.x and it used to work on master at > > > > > least > > > > until 2014- > > > > > 12-08. > > > > > > > > > > This is the last MTT entry for working PSM (with my older > > > > > version) > > > > > http://mtt.open-mpi.org/index.php?do_redir=2226 > > > > > > > > > > and since a few days it fails on master > > > > > http://mtt.open-mpi.org/index.php?do_redir=2225 > > > > > > > > > > On another system (RHEL7) with newer PSM libraries there is no > > > > > such > > > > error. > > > > > > > > > > Adrian > > > > _______________________________________________ > > > > devel mailing list > > > > de...@open-mpi.org > > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > Link to this post: > > > > http://www.open-mpi.org/community/lists/devel/2015/01/16766.php > > > > > > > > > _______________________________________________ > > > devel mailing list > > > de...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > Link to this post: > > > http://www.open-mpi.org/community/lists/devel/2015/01/16769.php > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/01/16770.php > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/01/16772.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16773.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16775.php >