Yeah, my bad for being cryptic - very busy day. The PSM team is doing some internal review of the problem and coming up with solutions. Since this involves a product, the discussion has to go thru some standard review and approval procedures before we can publicly comment on it.
Our hamsters are spinning the wheel as fast as they can, but it still may take us a little bit to arrive at an answer we can share with the community. We appreciate your patience. Meantime, the proposed workaround to use the MCA param to ignore PSM should work. Thanks Ralph > On Aug 26, 2015, at 3:34 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > I think it would be good to get some hard facts here: > > - is the infinipath library hijacking signal handlers? > - is the infinipath library not resetting those signal handlers when it is > done? > - is there a way to make the infinipath library release its use of signal > handlers upon demand? (e.g., via API call) > - is there a way to make the infinipath library not hijack some/all signal > handlers to begin with? (e.g., Paul posted a possible workaround, but he > wasn't sure about it) > - if none of these are possible in current versions of the infinipath > library, can some type of workaround be added for future versions? > > I'm guessing these are the kinds of questions Intel is discussing internally. > > >> On Aug 25, 2015, at 10:08 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >> Sorry - but there are some discussions that cannot and should not take place >> on a public mailing list. As a former corporate person yourself, you should >> understand :-) >> >>> On Aug 25, 2015, at 6:56 PM, Howard Pritchard <hpprit...@gmail.com> wrote: >>> >>> which off-list are we talking about? >>> very annoying. >>> >>> >>> 2015-08-25 10:38 GMT-06:00 Ralph Castain <r...@open-mpi.org>: >>> We’re looking at this off-list. It would be preferable not to disable PSM >>> if we can avoid it >>> >>>> On Aug 25, 2015, at 9:32 AM, Nathaniel Graham <nrgraha...@gmail.com> wrote: >>>> >>>> What if we modify the mpirun script to include the --mca mtl ^psm tag if >>>> java is in the run string? >>>> >>>> -Nathan >>>> >>>> On Tue, Aug 25, 2015 at 9:47 AM, Howard Pritchard <hpprit...@gmail.com> >>>> wrote: >>>> I'll update the java FAQ. >>>> >>>> 2015-08-25 8:36 GMT-06:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com>: >>>> On Aug 25, 2015, at 10:00 AM, Howard Pritchard <hpprit...@gmail.com> wrote: >>>>> >>>>> I think rather than trying workarounds of dubious robustness inside open >>>>> mpi we >>>>> >>>>> - dicument the issue on either the somewhat aged open mpi website faq or >>>>> add it to a wiki page on github >>>> >>>> It should probably be documented in the README and the FAQ. >>>> >>>> I'd be against adding user documentation to the wiki -- this would be a >>>> 3rd place for users to look for information. >>>> >>>>> - file a bug against intel psm >>>> >>>> I'd like to hear what they have to say first... :-) >>>> >>>>> >>>>> ---------- >>>>> >>>>> sent from my smart phonr so no good type. >>>>> >>>>> Howard >>>>> >>>>> On Aug 25, 2015 6:02 AM, "Gilles Gouaillardet" >>>>> <gilles.gouaillar...@gmail.com> wrote: >>>>> i do not know if this can be runtime detected ... >>>>> note we should report this to intel folks and ask them to advise. >>>>> ideally, they would provide a way to make sure libinfinipath.so does not >>>>> conflict with the jvm signal handlers. >>>>> >>>>> my idea is to dlopen libinfinipath only if java bindings are not used. >>>>> >>>>> On Tuesday, August 25, 2015, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >>>>> wrote: >>>>> Is it possible to run-time detect this situation? E.g., probe the signal >>>>> handler, or somesuch. >>>>> >>>>> Rationale: I'd rather have something run-time disabled than not built. >>>>> >>>>> Would dlopen'ing libinfinipath change actually change its signal handler >>>>> behavior? >>>>> >>>>> >>>>>> On Aug 25, 2015, at 4:27 AM, Gilles Gouaillardet <gil...@rist.or.jp> >>>>>> wrote: >>>>>> >>>>>> Folks, >>>>>> >>>>>> some time ago, some crashes were reported when using java bindings. >>>>>> one of them was caused was caused by mca_mtl_psm.so. >>>>>> the root cause is libinfinipath.so initializer sets its own signal >>>>>> handler, which >>>>>> conflicts with the signal handler sets by the jvm. >>>>>> the only workaround is to disable the psm mtl >>>>>> (e.g. mpirun --mca mtl ^psm ...) >>>>>> since mpirun --mca mtl_psm_priority 0 ... does not work >>>>>> (libinfinipath.so is loaded, so the initializer is ran and the signal >>>>>> handlers are set) >>>>>> so the psm mtl cannot be disabled by the Java MPI_Init() >>>>>> >>>>>> one option is to document this >>>>>> an other option is not to build the psm mtl if java bindings are built >>>>>> and an other option is to revamp mca_mtl_psm.so so it does not link with >>>>>> libinfinipath.so >>>>>> (use an intermediate component, or dlopen libinfinipath) >>>>>> >>>>>> any thoughts ? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17838.php >>>>> >>>>> >>>>> -- >>>>> Jeff Squyres >>>>> jsquy...@cisco.com >>>>> For corporate legal information go to: >>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17840.php >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17841.php >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/08/17845.php >>>> >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/08/17847.php >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/08/17849.php >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/08/17851.php >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/08/17852.php >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/08/17861.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/08/17862.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/08/17866.php