On Wed, 25 Mar 2026 11:44:45 GMT, Thomas Stuefe <[email protected]> wrote:
>> This patch enables hs-err file generation for native out-of-stack cases. It >> is an optional analysis feature one can use when JVMs mysteriously vanish - >> typically, vanishing JVMs are either native stack overflows or OOM kills. >> >> This was motivated by the analysis difficulties of bugs like >> https://bugs.openjdk.org/browse/JDK-8371630. There are many more examples. >> >> ### Motivation >> >> Today, when native stack overflows, the JVM dies immediately without an >> hs-err file. This is because C++-compiled code does not bang - if the stack >> is too small, we walk right into whatever caps the stack. That might be our >> own yellow/red guard pages, native guard pages placed by libc or kernel, or >> possibly unmapped area after the end of the stack. >> >> Since we don't have a stack left to run the signal handler on, we cannot >> produce the hs-err file. If one is very lucky, the libc writes a short >> "Stack overflow" to stderr. But usually not: if it is a JavaThread and we >> run into our own yellow/red pages, it counts as a simple segmentation fault >> from the OS's point of view, since the fault address is inside of what it >> thinks is a valid pthread stack. So, typically, you just see "Segmentation >> fault" on stderr. >> >> ***Why do we need this patch? Don't we bang enough space for native code we >> call?*** >> >> We bang when entering a native function from Java. The maximum stack size we >> assume at that time might not be enough; moreover, the native code may be >> buggy or just too deeply or infinitely recursive. >> >> ***We could just increase `ShadowPages`, right?*** >> >> Sure, but the point is we have no hs-err file, so we don't even know it was >> a stack overflow. One would have to start debugging, which is work-intensive >> and may not even be possible in a customer scenario. And for buggy recursive >> code, any `ShadowPages` value might be too small. The code would need to be >> fixed. >> >> ### Implementation >> >> The patch uses alternative signal stacks. That is a simple, robust solution >> with few moving parts. It works out of the box for all cases: >> - Stack overflows inside native JNI code from Java >> - Stack overflows inside Hotspot-internal JavaThread children (e.g. >> CompilerThread, AttachListenerThread etc) >> - Stack overflows in non-Java threads (e.g. VMThread, ConcurrentGCThread) >> - Stack overflows in outside threads that are attached to the JVM, e.g. >> third-party JVMTI threads >> >> The drawback of this simplicity is that it is not suitable for always-on >> production use. That is due t... > > Thomas Stuefe has updated the pull request incrementally with one additional > commit since the last revision: > > fix runtime/ErrorHandling/ReattemptErrorTest.java src/hotspot/os/posix/threadAltSigStack_posix.cpp line 69: > 67: LogStream ls(lt); > 68: ls.print("Thread %zu alternate signal stack: %s (", > 69: os::current_thread_id(), (ss->ss_flags == SS_DISABLE) ? > "disabled" : "enabled"); Suggestion: ls.print("Thread %zu alternate signal stack: %s (", os::current_thread_id(), (ss->ss_flags == SS_DISABLE) ? "disabled" : "enabled"); src/hotspot/os/posix/threadAltSigStack_posix.cpp line 78: > 76: ls.print("Thread %zu failed to %s alternative signal stack (%s) (", > 77: os::current_thread_id(), (ss->ss_flags == SS_DISABLE) ? > "disable" : "enable", > 78: os::errno_name(errno)); Suggestion: ls.print("Thread %zu failed to %s alternative signal stack (%s) (", os::current_thread_id(), (ss->ss_flags == SS_DISABLE) ? "disable" : "enable", os::errno_name(errno)); src/hotspot/share/runtime/globals.hpp line 1982: > 1980: "binary search over simple linear search." ) > \ > 1981: > \ > 1982: product(bool, UseAltSigStacks, false, DIAGNOSTIC, > \ I think I'd prefer if all this code was `ifndef WINDOWS` rather than having flags that are ignored. Or pushed down into the `os/<os>/globals_<os>.hpp` files. src/hotspot/share/runtime/globals.hpp line 1983: > 1981: > \ > 1982: product(bool, UseAltSigStacks, false, DIAGNOSTIC, > \ > 1983: "Enable the use of alternative signal stacks (Non-Windows > only.") \ Suggestion: "Enable the use of alternative signal stacks during SIGBUS and SIGSEGV crash" \ "handling. (Non-Windows only.") \ src/hotspot/share/runtime/thread.hpp line 641: > 639: bool is_in_primary_or_alternate_stack(address adr) const { > 640: return is_in_full_stack(adr) || is_in_alternate_stack(adr); > 641: } This could be debug only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007171692 PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007172948 PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007215207 PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007208744 PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007189290
