Paul,
i tried PSM_RCVTHREAD=0 but it did not help
Jeff,
you did not read too much ... but my words were not quite accurate.
yes, the signal handlers are set in the library constructor.
by reading the source code, i found that can be avoided by setting
the yet undocumented IPATH_NO_BACKTRACE environment variable (at your
own risk
if you have some infinipath hardware)
i also noted the signal handlers are not restored (in a destructor) and
that is
likely the root cause of the crash
that means if ompi is configure'd with --disable-dlopen, the behavior is
gonna be different
(i did not test ...) since libinfinipath is not dlclose'd
the java binding must be a dynamic library (libmpi_java.so) and i did
not try to configure
with --enable-mpi-java --enable-static --disable-shared
(not sure whether that would work e.g. libmpi_java.so is linked with
libmpi.a and friends, or fail at build or runtime)
i will also shut up from now and let the fine folks at Intel implement a
definitive solution :-D
Cheers,
Gilles
On 8/27/2015 12:41 AM, Jeff Squyres (jsquyres) wrote:
On Aug 26, 2015, at 11:29 AM, Ralph Castain <r...@open-mpi.org> wrote:
...but only when the PSM MTL is not compiled directly into libmpi, e.g., via
using --disable-dlopen, or --enable-static (neither of which are the default,
but it's worth mentioning).
Is that true? If the problem lies in not “nicely” handling the errhandler
registrations, then so long as PSM is not selected, it shouldn’t have an impact.
That's what Gilles said. ...er, I guess he didn't state that directly; he said
that the signal handler is set when the DSO is dlopened, which I took to mean
that it occurs during a library constructor (and is therefore independent of
Open MPI selection):
http://www.open-mpi.org/community/lists/devel/2015/08/17857.php
I could be reading too much into Gilles' words, though...
I'll just shut up now and let Intel provide definitive answers to this issue.
:-)