MH customization doesn't help here. The benchmark measures the cost of MH type check + MH.invokeBasic() call.

For MH.invokeExact(), type check is ptr comparison of MH.type against MethodType associated with the call site.

MH.invokeBasic() involves the following steps:
  MethodHandle        --form-->
  LambdaForm          --vmentry-->
  MemberName          --method-->
  (ResolvedMemberName --vmtarget--> // since jdk11 [1])
  JVM_Method*         --_from_compiled_entry-->
  entry address

The only optimization I see is to remove LambdaForm step and access MemberName (ResolvedMemberName since jdk11) directly from MethodHandle.
But there'll be still 3 dereferences involved:
  MethodHandle         --form-->
  [Resolved]MemberName --vmtarget-->
  JVM_Method*          --_from_compiled_entry-->
  entry address

The downside of such removal would be inability to rewrite individual LambdaForms (e.g., to eliminate redundant class initialization check) w/o tracking all MethodHandles which use particular LambdaForm. Probably, we can live without that (especially in JIT-compiled code).

In total, it ends up as 4 indirect loads (3 selection steps + 1 load from MH.type for type check) and I don't see a way to cut it down further.

For example, MemberName is a sort of handle for JVM internal Method*. JVM keeps a table of all MemberName instances and iterates over them when, for example, class redefinition happens. If MemberName indirection is eliminated, then MethodHandle would point directly to JVM_Method and JVM has to track all MethodHandle instances instead.

JVM_Method* is required due to similar reasons.

Type check on MH can't be further optimized as well.

So, I'm quite pessimistic about the prospects of speeding up invocations on non-constant MethodHandles.

Vladimir Ivanov did some work a few years ago on MH customization for hot MH 
instances. It’s in the system. That should get better results than what you 
show. I wonder why it isn’t kicking in. You are using invokeExact right?

Hey folks!

I'm running some simple benchmarks for my FOSDEM handles talk and wanted to 
reopen discussion about the performance of non-static-final method handles.

In my test, I just try to call a method that adds given argument to a static 
long. The numbers for reflection and static final handle are what I'd expect, 
with the latter basically being equivalent to a direct call:

Direct: 0.05ns/call
Reflected: 3ns/call
static final Handle: 0.05ns/call

If the handle is coming from an instance field or local variable, however, 
performance is only slightly faster than reflection. I assume the only real 
improvement in this case is that it doesn't box the long value I pass in.

local var Handle: 2.7ns/call

What can we do to improve the performance of non-static method handle 

