> This patch enables hs-err file generation for native out-of-stack cases. It > is an optional analysis feature one can use when JVMs mysteriously vanish - > typically, vanishing JVMs are either native stack overflows or OOM kills. > > This was motivated by the analysis difficulties of bugs like > https://bugs.openjdk.org/browse/JDK-8371630. There are many more examples. > > ### Motivation > > Today, when native stack overflows, the JVM dies immediately without an > hs-err file. This is because C++-compiled code does not bang - if the stack > is too small, we walk right into whatever caps the stack. That might be our > own yellow/red guard pages, native guard pages placed by libc or kernel, or > possibly unmapped area after the end of the stack. > > Since we don't have a stack left to run the signal handler on, we cannot > produce the hs-err file. If one is very lucky, the libc writes a short "Stack > overflow" to stderr. But usually not: if it is a JavaThread and we run into > our own yellow/red pages, it counts as a simple segmentation fault from the > OS's point of view, since the fault address is inside of what it thinks is a > valid pthread stack. So, typically, you just see "Segmentation fault" on > stderr. > > ***Why do we need this patch? Don't we bang enough space for native code we > call?*** > > We bang when entering a native function from Java. The maximum stack size we > assume at that time might not be enough; moreover, the native code may be > buggy or just too deeply or infinitely recursive. > > ***We could just increase `ShadowPages`, right?*** > > Sure, but the point is we have no hs-err file, so we don't even know it was a > stack overflow. One would have to start debugging, which is work-intensive > and may not even be possible in a customer scenario. And for buggy recursive > code, any `ShadowPages` value might be too small. The code would need to be > fixed. > > ### Implementation > > The patch uses alternative signal stacks. That is a simple, robust solution > with few moving parts. It works out of the box for all cases: > - Stack overflows inside native JNI code from Java > - Stack overflows inside Hotspot-internal JavaThread children (e.g. > CompilerThread, AttachListenerThread etc) > - Stack overflows in non-Java threads (e.g. VMThread, ConcurrentGCThread) > - Stack overflows in outside threads that are attached to the JVM, e.g. > third-party JVMTI threads > > The drawback of this simplicity is that it is not suitable for always-on > production use. That is due to the added footprint costs of alternative > stacks: eve...
Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: - Merge branch 'openjdk:master' into altsigstack - fix runtime/ErrorHandling/ReattemptErrorTest.java - fix windows build - feedback yasumasa - fix MachCodeFramesInErrorFile - feedback david - Update test/hotspot/jtreg/gtest/NativeStackOverflowGtest.java Co-authored-by: David Holmes <[email protected]> - fix gtests - default-off - reduce diff - ... and 16 more: https://git.openjdk.org/jdk/compare/50632433...e247a6cf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29559/files - new: https://git.openjdk.org/jdk/pull/29559/files/e2ec1f43..e247a6cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29559&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29559&range=04-05 Stats: 180537 lines in 4380 files changed: 79448 ins; 80487 del; 20602 mod Patch: https://git.openjdk.org/jdk/pull/29559.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29559/head:pull/29559 PR: https://git.openjdk.org/jdk/pull/29559
