This patch fixes a performance issue when dereferencing memory segments backed 
by different kinds of scopes. When looking at inline traces, I found that one 
of our benchmark, namely `LoopOverPollutedSegment` was already hitting the 
ceiling of the bimorphic inline cache, specifically when checking liveness of 
the segment scope in the memory access hotpath 
(`ResourceScopeImpl::checkValidState`). The benchmark only used segments backed 
by confined and global scope. I then added (in the initialization "polluting" 
loop) segments backed by a shared scope, and then the benchmark numbers started 
to look as follows:


Benchmark                                              Mode  Cnt  Score   Error 
 Units
LoopOverPollutedSegments.heap_segment_floats_VH        avgt   30  7.004 ? 0.089 
 ms/op
LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  7.159 ? 0.016 
 ms/op
LoopOverPollutedSegments.heap_segment_ints_VH          avgt   30  7.017 ? 0.110 
 ms/op
LoopOverPollutedSegments.heap_segment_ints_instance    avgt   30  7.175 ? 0.048 
 ms/op
LoopOverPollutedSegments.heap_unsafe                   avgt   30  0.243 ? 0.004 
 ms/op
LoopOverPollutedSegments.native_segment_VH             avgt   30  7.366 ? 0.036 
 ms/op
LoopOverPollutedSegments.native_segment_instance       avgt   30  7.305 ? 0.098 
 ms/op
LoopOverPollutedSegments.native_unsafe                 avgt   30  0.238 ? 0.002 
 ms/op


That is, since now we have *three* different kinds of scopes (confined, shared 
and global), the call to the liveness check can no longer be inlined. One 
solution could be, as we do for the *base* accessor, to add a scope accessor to 
all memory segment implementation classes. But doing so only works ok for heap 
segments (for which the scope accessor just returns the global scope 
constants). For native segments, we're still megamorphic (as a native segment 
can be backed by all kinds of scopes).

In the end, it turned out to be much simpler to just make the liveness check 
monomorphic, since there's so much sharing between the code paths already. With 
that change, numbers of the tweaked benchmark go back to normal:


Benchmark                                              Mode  Cnt  Score   Error 
 Units
LoopOverPollutedSegments.heap_segment_floats_VH        avgt   30  0.241 ? 0.003 
 ms/op
LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  0.244 ? 0.003 
 ms/op
LoopOverPollutedSegments.heap_segment_ints_VH          avgt   30  0.242 ? 0.003 
 ms/op
LoopOverPollutedSegments.heap_segment_ints_instance    avgt   30  0.248 ? 0.001 
 ms/op
LoopOverPollutedSegments.heap_unsafe                   avgt   30  0.247 ? 0.013 
 ms/op
LoopOverPollutedSegments.native_segment_VH             avgt   30  0.245 ? 0.004 
 ms/op
LoopOverPollutedSegments.native_segment_instance       avgt   30  0.245 ? 0.001 
 ms/op
LoopOverPollutedSegments.native_unsafe                 avgt   30  0.247 ? 0.005 
 ms/op


Note that this patch tidies up a bit the usage of `checkValidState` vs. 
`checkValidStateSlow`. The former should only really be used in the hot path, 
while the latter is a more general routine which should be used in 
non-performance critical code. Making `checkValidState` monomorphic caused the 
`ScopeAccessError` to be generated in more places, so I needed to either update 
the usage to use the safer `checkValidStateSlow` (where possible) or, (in 
`Buffer` and `ConfinedScope`) just add extra wrapping.

-------------

Commit messages:
 - Initial push

Changes: https://git.openjdk.java.net/jdk18/pull/82/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=82&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8279527
  Stats: 108 lines in 8 files changed: 41 ins; 49 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk18/pull/82.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/82/head:pull/82

PR: https://git.openjdk.java.net/jdk18/pull/82

Reply via email to