On Wed, 5 Jan 2022 16:59:30 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> wrote:
> This patch fixes a performance issue when dereferencing memory segments > backed by different kinds of scopes. When looking at inline traces, I found > that one of our benchmark, namely `LoopOverPollutedSegment` was already > hitting the ceiling of the bimorphic inline cache, specifically when checking > liveness of the segment scope in the memory access hotpath > (`ResourceScopeImpl::checkValidState`). The benchmark only used segments > backed by confined and global scope. I then added (in the initialization > "polluting" loop) segments backed by a shared scope, and then the benchmark > numbers started to look as follows: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VH avgt 30 7.004 ? > 0.089 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ? > 0.016 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ? > 0.110 ms/op > LoopOverPollutedSegments.heap_segment_ints_instance avgt 30 7.175 ? > 0.048 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ? > 0.036 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ? > 0.098 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ? > 0.002 ms/op > > > That is, since now we have *three* different kinds of scopes (confined, > shared and global), the call to the liveness check can no longer be inlined. > One solution could be, as we do for the *base* accessor, to add a scope > accessor to all memory segment implementation classes. But doing so only > works ok for heap segments (for which the scope accessor just returns the > global scope constants). For native segments, we're still megamorphic (as a > native segment can be backed by all kinds of scopes). > > In the end, it turned out to be much simpler to just make the liveness check > monomorphic, since there's so much sharing between the code paths already. > With that change, numbers of the tweaked benchmark go back to normal: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VH avgt 30 0.241 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_instance avgt 30 0.248 ? > 0.001 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ? > 0.013 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ? > 0.001 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ? > 0.005 ms/op > > > Note that this patch tidies up a bit the usage of `checkValidState` vs. > `checkValidStateSlow`. The former should only really be used in the hot path, > while the latter is a more general routine which should be used in > non-performance critical code. Making `checkValidState` monomorphic caused > the `ScopeAccessError` to be generated in more places, so I needed to either > update the usage to use the safer `checkValidStateSlow` (where possible) or, > (in `Buffer` and `ConfinedScope`) just add extra wrapping. src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java line 190: > 188: @ForceInline > 189: public final void checkValidState() { > 190: if (owner != null && owner != Thread.currentThread()) { For consistency we could change code `checkValidStateSlow` to refer directly to `owner`. It would be satisfying, but I don't know if it's possible, to compose `checkValidStateSlow` from `checkValidState` e.g. public final checkValidStateSlow() { checkValidState(); if (!isAlive() { ... } } ------------- PR: https://git.openjdk.java.net/jdk18/pull/82