Hi,

Currently, opal_util_register_stackhandlers() in opal/util/stacktrace.c
calls sigaction() with a third NULL argument, meaning you don't look
at possibly previously installed signal handlers, and always override
them with print_stackframe().

But there are actually realistic scenarios where an application actively
uses these signals, and also wants to use MPI.  As an example, the default
opal "signal" parameter settings are such that SIG_SEGV is redirected.
Typically, indeed, SIG_SEGV indicates a bug somewhere, and the stacktrace
from Open MPI is a nice bonus.   However, the Sun Java JDK uses SIG_SEGV
to detect when stacks should be automatically extended, and it stops working
rather ungracefully when that handler gets replaced.

(BTW, we stumbled on this recently when we added an MPI backend for our
Ibis grid programming environment.  It took a bit of time to figure out
what was happening, since we got no usable stacktrace for the thread that
got bitten.  We suspected a bug in our native code mapping at first,
but MPICH did not have this problem).

In most cases, you can of course work around it by manually changing
the opal "signal" list, but it would be nicer if Open MPI would detect
the situation, and e.g. only install the stack printer when there is
no handler yet, or at least warn about the possible clash.

Thanks!
Kees Verstoep

Reply via email to