On Fri, 21 Jan 2022 17:02:38 GMT, Maxim Kartashev <d...@openjdk.java.net> wrote:

> These crashes were not reproducible, so the fix is based on a hypothesis that 
> there are two possible reasons for them:
> 1. `makeDefaultConfig()` returning `NULL`.
> 2. A race condition when the number of screens changes.
> The race scenario: `X11GraphisDevice.makeDefaultConfiguration()` is called on 
> EDT so the call can race with `X11GraphisDevice.invalidate()` that re-sets 
> the screen number of the device; the latter is invoked on the `AWT-XAWT` 
> thread from `X11GraphicsEnvironment.rebuildDevices()`. So by the time 
> `makeDefaultConfiguration()` makes a native call with the device's current 
> screen number, the `x11Screens` array maintained by the native code could 
> have become shorter. And the native methods like 
> `Java_sun_awt_X11GraphicsDevice_getConfigColormap()` assume that the number 
> passed to them is always current and valid. The AWT lock only protects 
> against the changes during the native methods invocation and does not protect 
> against them being called with an outdated screen number. With a larger 
> screen number, those methods read past the end of the `x11Screens` array.
> 
> The fix for (1) is to die gracefully instead of crashing in an attempt to 
> de-reference a `NULL` pointer, which might happen upon returning from 
> `makeDefaultConfig()` in `getAllConfigs()`.
> 
> The fix for (2) is to eliminate the race by protecting 
> `X11GraphisDevice.screen` with the AWT lock such that it doesn't change when 
> the native methods working with it are active.
> 
> We've been shipping JetBrains Runtime with this fix for a few months now and 
> there were no crash reports with those specific patterns against the versions 
> with the fix.

What's preventing me from going down the path of returning some default instead 
of locking are these two considerations:
1. By returning a default value when the screen number "suddenly" goes out 
range, the function will essentially lie. And it's going to take quite a while 
for me to prove that this lie will not have a long-lasting adverse effect.
2. The idea of testing if the fix actually cures the crash suggested by @mrserb 
will work, but only to prove that one particular hypothesis about the crash is 
correct. It won't help to prove that the whole process remains in a stable 
state and continues functioning correctly. But our our users kind of proved 
that already and I'm very reluctant to replaces what I believe is already 
working with what will probably work.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7182

Reply via email to