Paul,

i tried PSM_RCVTHREAD=0 but it did not help

Jeff,

you did not read too much ... but my words were not quite accurate.

yes, the signal handlers are set in the library constructor.
by reading the source code, i found that can be avoided by setting
the yet undocumented IPATH_NO_BACKTRACE environment variable (at your own risk
if you have some infinipath hardware)
i also noted the signal handlers are not restored (in a destructor) and that is
likely the root cause of the crash
that means if ompi is configure'd with --disable-dlopen, the behavior is gonna be different
(i did not test ...) since libinfinipath is not dlclose'd

the java binding must be a dynamic library (libmpi_java.so) and i did not try to configure
with --enable-mpi-java --enable-static --disable-shared
(not sure whether that would work e.g. libmpi_java.so is linked with libmpi.a and friends, or fail at build or runtime)

i will also shut up from now and let the fine folks at Intel implement a definitive solution :-D

Cheers,

Gilles

On 8/27/2015 12:41 AM, Jeff Squyres (jsquyres) wrote:
On Aug 26, 2015, at 11:29 AM, Ralph Castain <r...@open-mpi.org> wrote:
...but only when the PSM MTL is not compiled directly into libmpi, e.g., via 
using --disable-dlopen, or --enable-static (neither of which are the default, 
but it's worth mentioning).
Is that true? If the problem lies in not “nicely” handling the errhandler 
registrations, then so long as PSM is not selected, it shouldn’t have an impact.
That's what Gilles said.  ...er, I guess he didn't state that directly; he said 
that the signal handler is set when the DSO is dlopened, which I took to mean 
that it occurs during a library constructor (and is therefore independent of 
Open MPI selection):

     http://www.open-mpi.org/community/lists/devel/2015/08/17857.php

I could be reading too much into Gilles' words, though...

I'll just shut up now and let Intel provide definitive answers to this issue.  
:-)


Reply via email to