This PR limits the number of cases in which we deoptimize frames when closing a 
shared Arena. The initial intent of this was to improve the performance of 
shared arena closure in cases where a lot of threads are accessing and closing 
shared arenas at the same time (see attached benchmark), but unfortunately even 
disabling deoptimization altogether does not have a great effect on that 
benchmark.

Nevertheless, I think the extra logging/testing/benchmark code, and comments 
I've written, together with reducing the number of cases where we deoptimize 
(which makes it clearer exactly why we need to deoptimize in the first place), 
will be useful going forward. So, I've a create this PR out of them.

In this PR:
- I've separated the stack walking code (`for_scope_method`) from the code that 
checks for a reference to the arena being closed (`is_accessing_session`), and 
added logging code to the former. That also required changing vframe code to 
accept an `ouputStream*` rather than always printing to `tty`.
- Added a new test (`TestConcurrentClose`), that tries to close many shared 
arenas at the same time, in order to stress that use case.
- Added a new benchmark (`ConcurrentClose`), that stresses the cases where many 
threads are accessing and closing shared arenas.

I've done several benchmark runs with different amounts of threads. The 
confined case stays much more consistent, while the shared cases balloons up in 
time spent quickly when there are more than 4 threads:


Benchmark                     Threads   Mode  Cnt     Score     Error  Units
ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ± 202.870  us/op
ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ± 164.922  us/op
ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ± 165.754  us/op
ConcurrentClose.sharedAccess        8   avgt   10   593.828 ±   8.321  us/op
ConcurrentClose.sharedAccess        7   avgt   10   470.700 ±  22.511  us/op
ConcurrentClose.sharedAccess        6   avgt   10   386.697 ±  59.170  us/op
ConcurrentClose.sharedAccess        5   avgt   10   291.157 ±   7.023  us/op
ConcurrentClose.sharedAccess        4   avgt   10   209.178 ±   5.802  us/op
ConcurrentClose.sharedAccess        1   avgt   10    52.042 ±   0.630  us/op
ConcurrentClose.confinedAccess     32   avgt   10    25.517 ±   1.069  us/op
ConcurrentClose.confinedAccess      1   avgt   10    12.398 ±   0.098  us/op


(I manually added the `Threads` collumn btw)

Testing: tier 1-4

-------------

Commit messages:
 - polish
 - slightly improve comment
 - tweak comment
 - improve benchmark parameters
 - cleanup
 - add benchmark
 - add note about lacking session oop at safepoint
 - Only deopt if necessary
 - refactor close handshake
 - Return before deoptimizing of target thread already has async exception

Changes: https://git.openjdk.org/jdk/pull/20158/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20158&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335480
  Stats: 428 lines in 6 files changed: 339 ins; 19 del; 70 mod
  Patch: https://git.openjdk.org/jdk/pull/20158.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158

PR: https://git.openjdk.org/jdk/pull/20158

Reply via email to