On Sat, 20 Sep 2025 07:33:34 GMT, Thomas Stuefe <[email protected]> wrote:
> ASAN, when catching an error, will abort the process. > > Two things control this: > 1) the compiler option `-fsanitize-recover=address` (resp. > `-fno-sanitize-recover=address`. This controls whether, once ASAN returns > from its error report, the compiler-generated ASAN stubs will abort the > process. This is by default set to `-fno-sanitize-recover=address`, so we > won't recover. > 2) The runtime option `halt_on_error` controls whether ASAN itself returns > from its error handler or whether it aborts the process. This, by default, is > set to `1`, so by default ASAN aborts. > > We "double abort" in the sense that two options are overlaid and both prevent > the process from continuing. > > I propose that we set, during build time for ASAN builds, the option > `-fsanitize-recover=address`. Now, we can control whether to abort or not > using the runtime setting `halt_on_error=0`. By default, we still will abort, > since `halt_on_error=1`. So, the default behavior won't change. However, we > can now at least decide to do it differently. > > What would that give us? > > By aborting right away, ASAN denies the JVM the option to catch the error and > write an hs-err file. Of course, not every error that ASAN catches will > result in a segfault or in an assertion. The JVM could lurch on for a bit > before it stumbles. However, the chance for the JVM to stop on its own very > soon after a memory corruption happens is pretty good. Then we get a hs-err > file and a crash dump in close correlation to the error ASAN caught. > > And even if there is no close relationship between the original ASAN error > and the eventual segfault/assertion (think ASAN sees a double free, JVM > continues, and after a while asserts somewhere else as a remote consequence > of the error - the stacks in the hs-err file won't be related to the original > error) - the hs-err file is shock-full of helpful information about running > threads (see also > [JDK-8368124](https://bugs.openjdk.org/browse/JDK-8368124)), memory mappings, > JVM flags, etc. All of that would make it easier to understand the ASAN > report. > > And even if the JVM survives, one can still attach to the still living > process and grab thread dumps, VM.info reports, heap dumps etc. I found a better and more reliable way to get hs-err files with ASAN : https://github.com/openjdk/jdk/pull/27446. So for me, the main motivation for this change is gone, and I wonder whether I should just close this PR. Only, I think it still worthwhile to have at least the option to continue running the JVM. Mostly because the new alternative proposal, albeit a lot better than this one, relies on the ability of installing ASAN callbacks, and not all ASAN versions may allow that. @afshin-zafari > The logic behind the ASAN is that when an error is detected, the program is > in an unstable state. So letting it to continue may produce more errors that > more likely are caused by the first error. I am aware of that, and count on that. Please see the motivation I gave in the description above. > Instead of turning on/off the whole build/run of the program, we can > skip/exclude the places (i.e., functions) that we don't want the ASAN reports > (using ATTRIBUTE_NO_ASAN). That would not be very useful though: you would hide the error. I want to see the error. I just want the JVM to continue after that error report has been written to stderr. @kimbarrett > But maybe we should consider this PR premature, since > `-fsanitize-recovery=address" is still experimental. I am pretty sure its experimental because there is no safe way in which the program could continue. So - the feature itself is stable, but the target program would be instable. I don't see what the disadvantage would be in allowing to do that, though. Whoever uses ASAN in a way like this must know what he is doing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27404#issuecomment-3323150387
