On Fri, 21 Jan 2022 17:02:38 GMT, Maxim Kartashev <d...@openjdk.java.net> wrote:
> These crashes were not reproducible, so the fix is based on a hypothesis that > there are two possible reasons for them: > 1. `makeDefaultConfig()` returning `NULL`. > 2. A race condition when the number of screens changes. > The race scenario: `X11GraphisDevice.makeDefaultConfiguration()` is called on > EDT so the call can race with `X11GraphisDevice.invalidate()` that re-sets > the screen number of the device; the latter is invoked on the `AWT-XAWT` > thread from `X11GraphicsEnvironment.rebuildDevices()`. So by the time > `makeDefaultConfiguration()` makes a native call with the device's current > screen number, the `x11Screens` array maintained by the native code could > have become shorter. And the native methods like > `Java_sun_awt_X11GraphicsDevice_getConfigColormap()` assume that the number > passed to them is always current and valid. The AWT lock only protects > against the changes during the native methods invocation and does not protect > against them being called with an outdated screen number. With a larger > screen number, those methods read past the end of the `x11Screens` array. > > The fix for (1) is to die gracefully instead of crashing in an attempt to > de-reference a `NULL` pointer, which might happen upon returning from > `makeDefaultConfig()` in `getAllConfigs()`. > > The fix for (2) is to eliminate the race by protecting > `X11GraphisDevice.screen` with the AWT lock such that it doesn't change when > the native methods working with it are active. > > We've been shipping JetBrains Runtime with this fix for a few months now and > there were no crash reports with those specific patterns against the versions > with the fix. What's preventing me from going down the path of returning some default instead of locking are these two considerations: 1. By returning a default value when the screen number "suddenly" goes out range, the function will essentially lie. And it's going to take quite a while for me to prove that this lie will not have a long-lasting adverse effect. 2. The idea of testing if the fix actually cures the crash suggested by @mrserb will work, but only to prove that one particular hypothesis about the crash is correct. It won't help to prove that the whole process remains in a stable state and continues functioning correctly. But our our users kind of proved that already and I'm very reluctant to replaces what I believe is already working with what will probably work. ------------- PR: https://git.openjdk.java.net/jdk/pull/7182