Nice results. Thanks for pushing it through. Can we figure out what FilterGeneric$F3.invoke_V0 is doing there?
It is the combinator (f,g)=>(x,y,z)=>g(f(x),y,z). (See near line 564 of http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/dyn/FilterGeneric.java ) The g is probably yours. The f may be (x:Object)->((boolean)(Boolean)x), as in GuardWithTest.make which uses convertArguments to make sure the predicate produces a boolean. (See near line 937 of http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/dyn/MethodHandleImpl.java ) I'd like to get rid of the F3.invoke_V0 frame... -- John On Jul 27, 2010, at 3:53 PM, Charles Oliver Nutter wrote: > Here's the real trace... > > at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328) > at sun.dyn.FilterGeneric$F3.invoke_V0(FilterGeneric.java:565) > at > sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830) > at > bench.bench_fib_recursive.method__0$RUBY$fib_ruby(bench_fib_recursive.rb:7) > > The method handle graph here works out like this: > > * guard on the type serial number > * fast path is the direct handle to the target method, seen above > * slow path is the old inline-caching logic that invokes against our > pseudo-handles > > Some numbers... In this comparison the indy stuff it's only optimizing > the < + - methods to direct paths. > > In the first case, there's no invokedynamic and we dispatch through a > separate piece of code that's specific to the math operator and > Fixnum, that looks like this: > > public IRubyObject call(ThreadContext context, IRubyObject caller, > IRubyObject self, long fixnum) { > if (self instanceof RubyFixnum) { > return ((RubyFixnum) self).op_plus(context, fixnum); > } > return super.call(context, caller, self, fixnum); > } > > And cases that return an IRubyObject (like the call to fib itself) > dispatch through an object version that just does a normal monomorphic > cache. > > In the second case, we're using an object Fixnum in every case > (instead of a long for literal cases like above), and dispatching all > three math operators through indy. In this case, there are no > functional differences between the two call paths...for example, the > actual pseudo-handle for + looks like this: > > public org.jruby.runtime.builtin.IRubyObject > call(org.jruby.runtime.ThreadContext, > org.jruby.runtime.builtin.IRubyObject, org.jruby.RubyModule, > java.lang.String, org.jruby.runtime.builtin.IRubyObject); > Code: > 0: aload_2 > 1: checkcast #13 // class org/jruby/RubyFixnum > 4: aload_1 > 5: aload 5 > 7: invokevirtual #17 // Method > org/jruby/RubyFixnum.op_plus:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject; > 10: areturn > } > > Now, the numbers: > > Stock JRuby with long call paths and manually-specialized > Fixnum#<math> call sites: > > ~/projects/jruby ➔ jruby --server -J-XX:MaxInlineSize=150 > -J-XX:InlineSmallCode=1500 bench/bench_fib_recursive.rb 10832040 > 0.409000 0.000000 0.409000 ( 0.353000) > 832040 > 0.217000 0.000000 0.217000 ( 0.216000) > 832040 > 0.217000 0.000000 0.217000 ( 0.217000) > 832040 > 0.217000 0.000000 0.217000 ( 0.217000) > 832040 > 0.217000 0.000000 0.217000 ( 0.217000) > 832040 > 0.217000 0.000000 0.217000 ( 0.217000) > 832040 > 0.217000 0.000000 0.217000 ( 0.217000) > 832040 > 0.217000 0.000000 0.217000 ( 0.217000) > 832040 > 0.217000 0.000000 0.217000 ( 0.217000) > 832040 > 0.217000 0.000000 0.217000 ( 0.217000) > > Invokedynamic with fast path as a volatile int read + compare and direct call: > ~/projects/jruby ➔ jruby --server -J-XX:+UnlockExperimentalVMOptions > -J-XX:+EnableInvokeDynamic -J-Djruby.compile.invokedynamic=true > -J-XX:MaxInlineSize=150 -J-XX:InlineSmallCode=1500 > bench/bench_fib_recursive.rb 100 > 832040 > 0.417000 0.000000 0.417000 ( 0.361000) > 832040 > 0.166000 0.000000 0.166000 ( 0.166000) > 832040 > 0.164000 0.000000 0.164000 ( 0.164000) > 832040 > 0.164000 0.000000 0.164000 ( 0.164000) > 832040 > 0.164000 0.000000 0.164000 ( 0.164000) > 832040 > 0.164000 0.000000 0.164000 ( 0.164000) > 832040 > 0.164000 0.000000 0.164000 ( 0.164000) > 832040 > 0.164000 0.000000 0.164000 ( 0.164000) > 832040 > 0.164000 0.000000 0.164000 ( 0.163000) > 832040 > 0.180000 0.000000 0.180000 ( 0.180000) > > This is a much more impressive boost over the non-indy logic than > previously (fast path still dispatched through our pseudo-handles), > which I guess is due to getting those extra frames out of the call > path: > > (old non-direct, via-pseudo-handle indy logic) > ~/projects/jruby ➔ jruby --server -J-XX:+UnlockExperimentalVMOptions > -J-XX:+EnableInvokeDynamic -J-Djruby.compile.invokedynamic=true > -J-XX:MaxInlineSize=150 -J-XX:InlineSmallCode=1500 > bench/bench_fib_recursive.rb 10 > 832040 > 0.438000 0.000000 0.438000 ( 0.382000) > 832040 > 0.199000 0.000000 0.199000 ( 0.200000) > 832040 > 0.206000 0.000000 0.206000 ( 0.205000) > 832040 > 0.196000 0.000000 0.196000 ( 0.196000) > 832040 > 0.198000 0.000000 0.198000 ( 0.198000) > 832040 > 0.196000 0.000000 0.196000 ( 0.196000) > 832040 > 0.195000 0.000000 0.195000 ( 0.195000) > 832040 > 0.196000 0.000000 0.196000 ( 0.196000) > 832040 > 0.196000 0.000000 0.196000 ( 0.196000) > 832040 > 0.214000 0.000000 0.214000 ( 0.214000) > > Note that this is still using the old mechanism for the calls to fib > itself, and this is not encoding primitive indy calls where literals > are being passed, both of which will improve performance further. > > Note also this is still a March build of MLVM...so I'm guessing other > things have happened at the VM level that will improve it even more. > > I'm pleased with this new result! > > - Charlie > > On Tue, Jul 27, 2010 at 1:50 PM, Charles Oliver Nutter > <head...@headius.com> wrote: >> I'm slowly getting back into indy stuff :) I'm still running off a >> build from March, though, since ASM doesn't support the latest >> changes. >> >> Anyway, I mentioned at JVMLS that I thought I could get indy to patch >> through to the actual target method in my existing indy stuff. I said >> I could do it by today, but I was delayed...I have done it now :) >> >> I've only got it wired up for one arity case, but here's what it looks >> like (with some of the handles still in there...these should disappear >> as they're supported by the inlining, I presume): >> >> Old backtrace for def foo; 1 + 1; end >> >> at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328) >> at >> org.jruby.RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.call(org/jruby/RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.gen:65535) >> at sun.dyn.FilterGeneric$F7.invoke_F7(FilterGeneric.java:844) >> at sun.dyn.FilterGeneric$F6.invoke_F6(FilterGeneric.java:758) >> at >> sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830) >> at ruby.__dash_e__.method__0$RUBY$foo(-e:1) >> >> Because the current indy stuff binds to our DynamicMethod subclass >> (RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus), we have at least one >> extra bounce and a lot more argument juggling because the >> DynamicMethod.call paths are complicated. >> >> With the modified version, the fast path binds straight through to the >> actual target method with no intermediate wrapper: >> >> at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328) >> at sun.dyn.FilterGeneric$F3.invoke_V0(FilterGeneric.java:565) >> (at >> sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830)) >> at ruby.__dash_e__.method__0$RUBY$foo(-e:1) >> >> The GuardWithTest is not yet in my toy code, but I inserted it where >> it would be. You can see that once the handles fold away, there's no >> intermediate code between the caller and the callee. >> >> The interesting thing to me here is that since I know the actual >> target method in these cases, I can decorate the handle chain with the >> wrapper logic normally contained in the DynamicMethod subclass, which >> means with indy we *don't have to generate our intermediate >> pseudo-handles at all*. That's a tremendous win, for a few reasons: 1. >> that logic will no longer count against our inlining budgets (at least >> one stack frame and probably a good dozen+ bytecodes; and 2. I've >> wrangled raw ASM in the pseudo-handle generation logic way too many >> times to want to continue doing it :) >> >> Of course it also means we don't have the memory/size costs of >> generating those classes ourselves. >> >> I'm sure I can do this same thing for field/instance variable >> accesses, Ruby-to-Java calls, and more, and actually do iterative >> optimizations without an interpreter or tiered compilation. That's >> pretty cool. >> >> - Charlie >> > _______________________________________________ > mlvm-dev mailing list > mlvm-dev@openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev