Re: performance issue: 7023639: JSR 292 method handle invocation needs a fast path for compiled code

John Rose Tue, 01 Mar 2011 23:14:55 -0800

On Mar 1, 2011, at 5:42 PM, Rémi Forax wrote:

> There is also another optimization that should be done.
> Once all the optimizations that you have listed will be done,
> the code will be as fast (or as slow but I'm optimistic by nature) as 
> using inner classes.
> 
> Let say we have the following code:
> 
> ...
>   private static <E> void bar(ArrayList<E> list, Mapper<E, E> mapper) {
>     int size = list.size();
>     for(int i=0; i<size; i++) {
>       list.set(i, mapper.apply(list.get(i)));
>     }
>   }
> 
> If lambdas are implemented using inner-classes, mapper.apply is megamorphic
> and a vtable dispatch is done.


This is an important problem for classic method inlining, for MHs, and 
(eventually) for bulk data processing APIs.

One module contributes a loop template (bar) while another contributes the loop 
kernel (mapper1 = #{...}).

But in order to get full performance, the system has to combine both together 
in a customized form.

In this case, Hotspot has an old optimization that can almost do this:  If (as 
you say further down) the loop template (bar) is inlined (as bar') at a place 
where only one loop kernel appears, and the loop kernel is invoked via a stable 
monomorphic inline cache (unique to bar'), then the JVM has enough information 
to recompile the loop kernel call, if the compiler is run again.  There are two 
missing bits to make this happen:  First, our inlining heuristics do not detect 
loop templates for aggressive inlining.  Second, although we are just rolling 
out tiered compilation (yay!) we are only beginning to leverage the advantages 
of tiered optimization.  (The inlining of a settled monomorphic call in bar' is 
an example of tiered optimization.)

> If lambda are implemented using method-handles, mapper.apply will 
> directly call
> the underlying method handle (because there is only one implementation 
> of Mapper).

That's true, but it will still be an out-of-line call.  It will be a race 
between classic interface dispatch and whatever indirection trickery is used 
inside method handles.  The real way to win the race is to speed both up by 
increasing opportunities for inlining.  (The invokedynamic instruction may be 
viewed as a hook for forcing method handles to inline!)

> Here is test is not a hot method, so bar will be not be inlined in test
> and specialized for each lambda.
> The problem is how to tell the JIT that test should be inlined.
> 
> One solution is to go backward i.e detect that mapper.apply is a method 
> handle call
> so consider that all callers of bar should be compiled even if they are 
> not hot but only warm.

Yes.  I think we can get there, now that we can use tiered compilation to 
collect profile data from warm programs, and then re-optimize them.

So far I haven't distinguished method handles from classic interface instances. 
 The techniques which optimize classic interfaces will be applied to method 
handles, and (IMO) both will perform well.  It may be that method handles will 
be slightly easier to optimize, because they contain less noise data (a nominal 
interface implementation type).

A key missing bit in our initial implementation of method handles is an 
internal classification mechanism, so that method handles of similar "function 
shapes" will be grouped by the JVM into common "code shapes" when inlined or 
dispatched on.  (Method handles have classes at present, and they are profiled, 
but the system doesn't exploit this very well.  First we make it work, then we 
make it fast.)

The JVM leans heavily on concrete instance classes to determine the classic 
version of "instance shape".  It uses inline caches, profiling, and other 
techniques to do this.  If the optimizer can prove that there is a limited set 
of "shapes" at a given use point (ideally one "shape" but multiple are possible 
too) then it can output optimized code for that shape.  (N.B. I'm using the 
term "shape" in an informal metaphorical way here.)  I expect that the JVM's 
existing mechanisms and techniques for exploiting regularities in instance 
classes will cross-apply to method handles, in a well-tuned system.

Meanwhile, "invokedynamic" provides a unique user-visible hook to kick-start 
the inlining process at a given call site.

> If someone implement that before the release of JDK 8, I will praise him 
> every night.

It has always been the case that we have more ideas than we can implement.  
It's all a matter of resource allocation, for my employer, and for everybody 
else that works on the OpenJDK code.

-- John
_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Re: performance issue: 7023639: JSR 292 method handle invocation needs a fast path for compiled code

Reply via email to