On Wed, 25 Mar 2026 11:44:45 GMT, Thomas Stuefe <[email protected]> wrote:

>> This patch enables hs-err file generation for native out-of-stack cases. It 
>> is an optional analysis feature one can use when JVMs mysteriously vanish - 
>> typically, vanishing JVMs are either native stack overflows or OOM kills.
>> 
>> This was motivated by the analysis difficulties of bugs like 
>> https://bugs.openjdk.org/browse/JDK-8371630. There are many more examples.
>> 
>> ### Motivation
>> 
>> Today, when native stack overflows, the JVM dies immediately without an 
>> hs-err file. This is because C++-compiled code does not bang - if the stack 
>> is too small, we walk right into whatever caps the stack. That might be our 
>> own yellow/red guard pages, native guard pages placed by libc or kernel, or 
>> possibly unmapped area after the end of the stack. 
>> 
>> Since we don't have a stack left to run the signal handler on, we cannot 
>> produce the hs-err file. If one is very lucky, the libc writes a short 
>> "Stack overflow" to stderr. But usually not: if it is a JavaThread and we 
>> run into our own yellow/red pages, it counts as a simple segmentation fault 
>> from the OS's point of view, since the fault address is inside of what it 
>> thinks is a valid pthread stack. So, typically, you just see "Segmentation 
>> fault" on stderr.
>> 
>> ***Why do we need this patch? Don't we bang enough space for native code we 
>> call?***
>> 
>> We bang when entering a native function from Java. The maximum stack size we 
>> assume at that time might not be enough; moreover, the native code may be 
>> buggy or just too deeply or infinitely recursive. 
>> 
>> ***We could just increase `ShadowPages`, right?***
>> 
>> Sure, but the point is we have no hs-err file, so we don't even know it was 
>> a stack overflow. One would have to start debugging, which is work-intensive 
>> and may not even be possible in a customer scenario. And for buggy recursive 
>> code, any `ShadowPages` value might be too small. The code would need to be 
>> fixed.
>> 
>> ### Implementation
>> 
>> The patch uses alternative signal stacks. That is a simple, robust solution 
>> with few moving parts. It works out of the box for all cases: 
>> - Stack overflows inside native JNI code from Java 
>> - Stack overflows inside Hotspot-internal JavaThread children (e.g. 
>> CompilerThread, AttachListenerThread etc)
>> - Stack overflows in non-Java threads (e.g. VMThread, ConcurrentGCThread)
>> - Stack overflows in outside threads that are attached to the JVM, e.g. 
>> third-party JVMTI threads
>> 
>> The drawback of this simplicity is that it is not suitable for always-on 
>> production use. That is due t...
>
> Thomas Stuefe has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix runtime/ErrorHandling/ReattemptErrorTest.java

src/hotspot/os/posix/threadAltSigStack_posix.cpp line 69:

> 67:       LogStream ls(lt);
> 68:       ls.print("Thread %zu alternate signal stack: %s (",
> 69:           os::current_thread_id(), (ss->ss_flags == SS_DISABLE) ? 
> "disabled" : "enabled");

Suggestion:

      ls.print("Thread %zu alternate signal stack: %s (",
               os::current_thread_id(), (ss->ss_flags == SS_DISABLE) ? 
"disabled" : "enabled");

src/hotspot/os/posix/threadAltSigStack_posix.cpp line 78:

> 76:       ls.print("Thread %zu failed to %s alternative signal stack (%s) (",
> 77:           os::current_thread_id(), (ss->ss_flags == SS_DISABLE) ? 
> "disable" : "enable",
> 78:           os::errno_name(errno));

Suggestion:

      ls.print("Thread %zu failed to %s alternative signal stack (%s) (",
               os::current_thread_id(), (ss->ss_flags == SS_DISABLE) ? 
"disable" : "enable",
               os::errno_name(errno));

src/hotspot/share/runtime/globals.hpp line 1982:

> 1980:           "binary search over simple linear search." )                  
>     \
> 1981:                                                                         
>     \
> 1982:   product(bool, UseAltSigStacks, false, DIAGNOSTIC,                     
>     \

I think I'd prefer if all this code was `ifndef WINDOWS` rather than having 
flags that are ignored. Or pushed down into the `os/<os>/globals_<os>.hpp`  
files.

src/hotspot/share/runtime/globals.hpp line 1983:

> 1981:                                                                         
>     \
> 1982:   product(bool, UseAltSigStacks, false, DIAGNOSTIC,                     
>     \
> 1983:           "Enable the use of alternative signal stacks (Non-Windows 
> only.") \

Suggestion:

          "Enable the use of alternative signal stacks during SIGBUS and 
SIGSEGV crash" \
          "handling. (Non-Windows only.")                                       
        \

src/hotspot/share/runtime/thread.hpp line 641:

> 639:   bool is_in_primary_or_alternate_stack(address adr) const {
> 640:     return is_in_full_stack(adr) || is_in_alternate_stack(adr);
> 641:   }

This could be debug only.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007171692
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007172948
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007215207
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007208744
PR Review Comment: https://git.openjdk.org/jdk/pull/29559#discussion_r3007189290

Reply via email to