Hi all,

may I please have opinions and reviews on this cleanup-fix-patch for the 
hotspot signal handlers.

Its main intent is to simplify coding and to commonize some of it across all 
Posix platforms where possible. Also to fix a number of smaller issues.

This will have a number of benefits, mainly easing maintenance pain for porters 
and reducing bitrot for platform dependent code.

This all builds upon the work @gerard-ziemski did with 
https://bugs.openjdk.java.net/browse/JDK-8252324.

----

This cleanup was made more complicated by the fact that there exists a 
non-obvious and undocumented way for a third party app to chain signal handlers 
(beside the documented one of using libjsig). It seems that the 
JVM_handle_xxx_functions() are in fact interfaces for third party coding to 
invoke hotspot signal handling. This only makes sense in combination with 
`-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag 
is used quite a bit.

See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling 
this bit of history.

Unfortunately there is no official documentation from Sun or Oracle, and zero 
regression tests. So I try to preserve this interface as best as I can. I plan 
to add a proper regression test with a later change, but for now I don't have 
the time for that.

---

The fixed issues:

1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform 
handlers to guard a part of the platform handlers against execution in case the 
signal handlers are not yet installed. 

Initially this confused me since when this handler is called it would of course 
be installed. So that boolean would always be true. The only explanation I 
found was that since these handlers can be invoked directly from outside, this 
is some (ineffective) form of guard against calling this handler too early. 
But that guard can be left out and that boolean removed. Our signal handlers 
are safe to call before VM initialization is completed.

2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as 
bool, some as int) as well as unused in normal code paths (excluding outside 
calls).

3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a 
day-zero bug which caused it to not be exported.

4) Every platform handler has this section:

  JavaThread* thread = NULL;
  VMThread* vmthread = NULL;
  if (PosixSignals::are_signal_handlers_installed()) {
    if (t != NULL ){
      if(t->is_Java_thread()) {
        thread = t->as_Java_thread();
      }
      else if(t->is_VM_thread()){
        vmthread = (VMThread *)t;
      }
    }
  }

`vmthread` is unused on all platforms and can be removed.

5) Every platform handler has some variant of this section, to ignore SIGPIPE, 
SIGXFSZ (whose default actions are to terminate the VM):

  if (sig == SIGPIPE || sig == SIGXFSZ) {
    // allow chained handler to go first
    if (PosixSignals::chained_handler(sig, info, ucVoid)) {
      return true;
    } else {
      // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219
      return true;
    }
  }

- On s390 and ppc, we miss SIGXFSZ handling 
   _Update: Fixed separately for easier backport, see 
[https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_.
- both paths return true - section can be shortened

Side note: having handlers for those signals may be unnecessary. We could just 
set the signal handler to `SIG_IGN`. We would have to tiptoe around any third 
party handlers for those signals, but it still may be simpler.

6) At the end of every platform header, before calling into fatal error 
handling, we unblock the signal:

>  // unmask current signal
>  sigset_t newset;
>  sigemptyset(&newset);
>  sigaddset(&newset, sig);
>  sigprocmask(SIG_UNBLOCK, &newset, NULL);
>

- Use of `sigprocmask()` is UB in a multithreaded program. 
- but then, this section is unnecessary anyway, since 
[https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error 
signals at the start of the signal handler.

7) the JFR crash protection is not consistently checked in all platform 
handlers.

8) On Zero, when entering fatal error handling, we do so via fatal() instead of 
VMError::report_and_die(), thereby discarding the real crash context and 
obfuscating the register content in the hs-err file (we still see registers, 
but those stem from the assertion-poison-page mechanism).

9) on Linux ppc64 and AIX, we have this section:

>      if (sig == SIGILL && (pc < (address) 0x200)) {
>        goto report_and_die;
>      }

which is related to the fact that the zero page on AIX is readable, filled with 
0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is 
not a noop on PPC, so we don't nop-slide).

This coding is irrelevant on Linux. On AIX, it can also be removed, since this 
SIGILL would be unrecognized by the hotspot and later count as fatal error 
anyway.

10) When invoking the fatal error handler, we extract the pc from the context 
and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally 
correct. According to POSIX [3], for those signals the address of the faulting 
instruction is handed over in `si_info.si_addr`. 

On most platforms this does not matter, they are the same. But on some 
architectures the pc in the signal context actually points somewhere else, e.g. 
beyond the faulting instruction. Therefore `si_info.si_addr` is the better 
choice.

----

The changes in this patch:

a) hotspot signal handling is now done by the following functions:

<OS>                 <foreign code, probably signal handler too>
  |                           |
  v                           v
 javaSignalHandler       JVM_handle_linux_signal()
       |                   /
       v                  v
     javaSignalHandler_inner 
               |
               v
     PosixSignals::pd_hotspot_signal_handler()


The right branch only exists to support the `AllowUserSignalHandlers` case, see 
[4].

`javaSignalHandler` is registered as handler, as it was before; 
JVM_handle_linux_signal() is exported as before.
`javaSignalHandler_inner` contains the shared portion of the signal handler; 
`PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform 
dependent portions.


b) I commonized prologue- and epilogue coding. 
  - I simplified (4) to a single line in the shared handler
  - I moved the JFR thread crash protection (7) up to the shared handler
  - I moved the complete epilogue up to the shared handler. That includes 
calling the chained handlers, should they exist, as well as invoking the fatal 
error handler. That fixes (8) and (6)
  - Zero has this tradition of showing a robot telling a cat about the error 
signal, which I like, and kept.
  - I simplified (5) and commonized it, and removed (9) completely
  - In PosixSignals::install_signal_handlers(), I removed the 
`signal_handlers_are_installed` guard and replaced it with an assert. 
Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this 
clashes too much with your patch for JDK-8253742, I'll leave that part out.

Thanks for reviewing.

Testing: this patch ran through our nightlies, in an earlier form. They will be 
re-ran some more times.

I'd be happy if aarch64 porters could take a look at the aarch64 portion of 
this change.

Please note that I had to draw a line somewhere - this is an open ended issue 
and a lot more could be cleaned. See also Gerard's work on [5], which is under 
review too. 

----

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html
[2] 
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html
[3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html
[4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html
[5] https://bugs.openjdk.java.net/browse/JDK-8253742

-------------

Commit messages:
 - Initial patch

Changes: https://git.openjdk.java.net/jdk/pull/1034/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8255711
  Stats: 916 lines in 13 files changed: 116 ins; 686 del; 114 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1034.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1034/head:pull/1034

PR: https://git.openjdk.java.net/jdk/pull/1034

Reply via email to