so my takes from this:
1) future Reflection will use MethodHandles, meaning my variants
depending on Reflection will have similar times to MethodHandles very soon.
2) MethodHandle resolve code may improve in the future and my
MethodHandle variants would then probably become faster
3) there is actually not so much I can do right now, except maybe having
more pre-generated MethodHandles
bye Jochen
On 12.09.23 15:08, Claes Redestad wrote:
Hi,
I wasn’t suggesting full-blown reflection, but to archive some kind of
lookup table when generating the pre-generated LambdaForms. This would
add a little footprint and an extra lookup with negligible cost on every
lookup. So it’d add a little overhead when the speculation wins and
remove a larger cost on speculation failures.
But yes, fixing this in the runtime code would be great. Main issue is
that the linkResolver code you point out is shared between
resolveOrNull/resolveOrFail and regular bytecode linking - and we
probably shouldn’t change the semantics of the latter. If you can come
up with a patch idea that solves this issue for resolveOrNull I’d be
happy to file an RFE and help get it through review over at
[email protected]
11 sep. 2023 kl. 14:50 skrev [email protected]:
Hi Claes,
After looking at the usages of resolveOrFail in VarHandle, I believe
changing the runtime's THROW_MSG_NULL template occurrences in
linkResolver would be a better approach, for resolveOrFail is
currently the most efficient way for finding particular members;
reflection, on the other hand, has to perform a search over the list
of all methods to find one that's accessible.
On an unrelated note, VarForm can probably substitute failed
resolution MemberName with dummy ones like that for Object.toString so
we don't need to query resolveOrNull repeatedly.
/Chen
On Mon, Sep 11, 2023 at 5:28 PM Claes Redestad
<[email protected] <mailto:[email protected]>> wrote:
Hi,
It’s been something I’ve wanted to get rid of, sure. An
alternative that wouldn’t require changes to the runtime code
would be to store a table of LFs that have actually been generated
and skip the speculative VM call. This can be done in a few
different ways, would add a little overhead on hits but remove the
exception overhead (which clutters JFR recordings) on misses
/Claes
11 sep. 2023 kl. 11:02 skrev [email protected]
<mailto:[email protected]>:
Hello Jochen and Claes,
I have done a little debugging and have found the cause, that
looking up pre-generated LambdaForm (mentioned by Claes) causes
VM to initialize an NoSuchMethodError [1] that's later silently
dropped [2], but the NoSuchMethodError constructor is already
executed and the stacktrace filled, causing a significant
overhead, as shown in this [3] JMC's rendering of a JFR recording.
You can capture this NoSuchMethodError construction with IDE
debug, even when running an empty main, on newer JDK versions. I
tested with a breakpoint in NoSuchMethodError(String) constructor
and it hits twice (for instrumentation agent uses reflection,
which now depends on Method Handles after JEP 416 in Java 18).
I think a resolution would be to modify linkResolver so that it
can also resolve speculatively instead of always throwing
exceptions, but this might be too invasive and I want to hear
from other developers such as Claes, who authored the old
resolveOrNull silent-dropping patch.
Looking forward to a solution,
Chen Liang
[1]:
https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/interpreter/linkResolver.cpp#L773
<https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/interpreter/linkResolver.cpp#L773>
[2]:
https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/prims/methodHandles.cpp#L794-L796
<https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/prims/methodHandles.cpp#L794-L796>
[3]:
https://cr.openjdk.org/~liach/mess/invokerbytecodegen-cache-miss.png
<https://cr.openjdk.org/~liach/mess/invokerbytecodegen-cache-miss.png>
On Mon, Sep 11, 2023 at 4:41 PM Jochen Theodorou
<[email protected] <mailto:[email protected]>> wrote:
I changed my testing a bit to have more infrastructure types
and test
with a fresh VM each time.
The scenario is still the same: call a method foo with
argument 1. foo
does nothing but returning 0. Implement the call.
indyDirect:
bootstrap method selects method and produces constant call-site
indyDoubleDispatch:
bootstrap selects a selector method and produces a mutable
call-site.
selector then selects the target method and sets it in the
call-site
reflective:
a inner class is used to select the method using reflection
and directly
invoke it.
reflectiveCached:
same as reflective but caching the selected method
staticCallSite:
I have the call abstracted and replace what is called after
method
selection. Here with a direct call to the method using normal
Java
runtimeCallSite:
I have the call abstracted like staticCallSite, but instead
of replacing
with a direct call I create a class at runtime, which does
the direct
call for me.
My interest is in the performance of the first few calls. My
experiments
show that at most 5 calls there is no significant performance
change
anymore for a long time. But long time performance is
secondary right now.
Out of these implementations it is no surprise that
staticCallSite has
the least cost, but it is almost on par with the reflective
variant.
That really surprised me. It seems reflection came a long way
since the
old times. There is probably still a lot of cost in the long
term, but
well, I focus on the short term here right now.
The cached variant really differs not much but if reflection
gets a
score of 41, then the cached variant is at 105. That is
surprising much
for an additional if condition. But if you think of how many
instructions that involves maybe not that surprising.
indyDirect has
almost the same initial cost as the reflectiveCached.
indyDoubleDispatch
follows with a score of 149... which looks very much like
reflective+indyDirect-"a small something". At 361 we find
runtimeCallSite, the slowest by far. The numbers used to be quite
different for this, but back then MagicAccessor was an option
to reduce
cost.
My conclusion so far. callsite generation is a questionable
option. Not
only because of performance, but also because of the module
system.
Though we have cases where we can use the static variant.
The next best is actually reflective. But how would you combine
reflective with something that has better long term
performance? Even a
direct call with indy costs much more.
I think I have to change my tests.. I think I should test a
scenario in
which I have a quite big number - like 1 million - of one-time
call-sites to get really conclusive numbers... Of course that
means 1
million direct method handles for indy.
Well, I will write again if I have more numbers.
bye Jochen
_______________________________________________
mlvm-dev mailing list
[email protected] <mailto:[email protected]>
https://mail.openjdk.org/mailman/listinfo/mlvm-dev
<https://mail.openjdk.org/mailman/listinfo/mlvm-dev>
_______________________________________________
mlvm-dev mailing list
[email protected] <mailto:[email protected]>
https://mail.openjdk.org/mailman/listinfo/mlvm-dev
<https://mail.openjdk.org/mailman/listinfo/mlvm-dev>
_______________________________________________
mlvm-dev mailing list
[email protected] <mailto:[email protected]>
https://mail.openjdk.org/mailman/listinfo/mlvm-dev
<https://mail.openjdk.org/mailman/listinfo/mlvm-dev>
_______________________________________________
mlvm-dev mailing list
[email protected]
https://mail.openjdk.org/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
[email protected]
https://mail.openjdk.org/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
[email protected]
https://mail.openjdk.org/mailman/listinfo/mlvm-dev