Re: RFR: 8373128: Stack overflow handling for native stack overflows

Johan Sjölen Sat, 28 Feb 2026 05:27:37 -0800

On Wed, 4 Feb 2026 07:19:03 GMT, Thomas Stuefe <[email protected]> wrote:


> Still Draft, pls ignore for now. Patch is not done yet.
> 
> This patch enables hs-err file generation for native out-of-stack cases. It 
> is an optional analysis feature one can use when JVMs mysteriously vanish - 
> typically, vanishing JVMs are either native stack overflows or OOM kills.
> 
> This was motivated by the analysis difficulties of bugs like 
> https://bugs.openjdk.org/browse/JDK-8371630. There are many more examples.
> 
> ### Motivation
> 
> Today, when native stack overflows, the JVM dies immediately without an 
> hs-err file. This is because C++-compiled code does not bang - if the stack 
> is too small, we walk right into whatever caps the stack. That might be our 
> own yellow/red guard pages, native guard pages placed by libc or kernel, or 
> possibly unmapped area after the end of the stack. 
> 
> Since we don't have a stack left to run the signal handler on, we cannot 
> produce the hs-err file. If one is very lucky, the libc writes a short "Stack 
> overflow" to stderr. But usually not: if it is a JavaThread and we run into 
> our own yellow/red pages, it counts as a simple segmentation fault from the 
> OS's point of view, since the fault address is inside of what it thinks is a 
> valid pthread stack. So, typically, you just see "Segmentation fault" on 
> stderr.
> 
> ***Why do we need this patch? Don't we bang enough space for native code we 
> call?***
> 
> We bang when entering a native function from Java. The maximum stack size we 
> assume at that time might not be enough; moreover, the native code may be 
> buggy or just too deeply or infinitely recursive. 
> 
> ***We could just increase `ShadowPages`, right?***
> 
> Sure, but the point is we have no hs-err file, so we don't even know it was a 
> stack overflow. One would have to start debugging, which is work-intensive 
> and may not even be possible in a customer scenario. And for buggy recursive 
> code, any `ShadowPages` value might be too small. The code would need to be 
> fixed.
> 
> ### Implementation
> 
> The patch uses alternative signal stacks. That is a simple, robust solution 
> with few moving parts. It works out of the box for all cases: 
> - Stack overflows inside native JNI code from Java 
> - Stack overflows inside Hotspot-internal JavaThread children (e.g. 
> CompilerThread, AttachListenerThread etc)
> - Stack overflows in non-Java threads (e.g. VMThread, ConcurrentGCThread)
> - Stack overflows in outside threads that are attached to the JVM, e.g. 
> third-party JVMTI threads
> 
> The drawback of this simplicity is that it is not suitable for always-on 
> production use. That is du...

I guess the cost of the altsigstacks are mostly from all of those mmap calls? 
We could amortize those by making them in multiples, that might take down the 
cost enough to make this worth it to have by default. Future PR thoughts.

src/hotspot/os/posix/threadAltSigStack_posix.cpp line 44:

> 42:   // Note: the first thread initializing this would be the main thread 
> which
> 43:   // still runs single-threaded. It is invoked after initial argument 
> parsing.
> 44:   static size_t value = 0;

with just slightly more code you get a thread safe static (called Scott Meyers' 
singleton)

https://godbolt.org/z/zd3E5dxT4

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29559#issuecomment-3910269770
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r2813805620

Re: RFR: 8373128: Stack overflow handling for native stack overflows

Reply via email to