On Sun, Jun 5, 2011 at 6:18 AM, Charles Oliver Nutter <head...@headius.com> wrote: > That said...I have not recently re-attempted installing > invokedynamic-based primitive call paths. I'll give it another shot > this week and see where we stand. Testing a simple loop ought to show > quickly the overhead of invokedynamic versus my call sites.
Never one to shrug off a challenge, I decided to throw together a dirt-simple primitive invocation path for the Fixnum operations that support it in JRuby. This includes the math operators (except "/" which one library overrides), boolean operators, comparison operator, and bitwise operators. My dirt simple version does not install any guards; if the first invocation is against RubyFixnum it attempts to hardwire the call site directly to the RubyFixnum method corresponding to the operator ("op_plus" for "+", etc). Ignoring invokedynamic overhead, it should be faster than my call site logic, since the latter needs to look up call site (aload 0; getfield; aaload), invoke through the call site (invokevirtual), check that fixnum has not been modified (aload 1; getfield "runtime"; getfield "fixnumHasBeenModified"; jne), check repeatedly that the incoming object is a fixnum (instanceof + checkcast) and finally make the invocation of the primitive-receiving method. Unfortunately, the invokedynamic version is still slower. Investigation (with a simple loop) seems to show that it doesn't inline. Here's the relevant code from before (using JRuby's specialized call sites) and after (using invokedynamic). You can see the RubyFixnum.op_plus logic has inlined; I've cut it off roughly where it starts to do overflow checking on the result. The invokedynamic version does not inline, and does a callq to op_plus. Note that there's two assembly dumps for my simple loop in PrintAssembly output, but neither of them seem to show op_plus (or op_lt, incidentally) getting inlined. Bug? Shouldn't a virtual DMH bound to an invokedynamic call site through a handful of adapters (permute + explicitCast in this case) be inlining? https://gist.github.com/1008986 Here's the relevant code: jruby -e "def loop; a = 0; while a < 10_000_000; a += 1; end; end; 10.times { loop }" And the relevant addition to JRuby's indy support: // TODO: guards MethodHandle target = findVirtual(RubyFixnum.class, fastOpsMethod, MethodType.methodType(IRubyObject.class, ThreadContext.class, long.class)); target = MethodHandles.explicitCastArguments(target, MethodType.methodType(IRubyObject.class, IRubyObject.class, ThreadContext.class, long.class)); target = MethodHandles.permuteArguments(target, MethodType.methodType(IRubyObject.class, ThreadContext.class, IRubyObject.class, IRubyObject.class, String.class, long.class), new int[] {2,0,4}); site.setTarget(target); I'm pushing this logic (enable with -Xinvokedynamic.fastops=true), but it will be disabled until indy (on Hotspot) is faster than JRuby's built-in hacks (and I insert the appropriate guard logic!). OH, and FWIW, here's the LogCompilation -i output roughly around where I'd expect to see op_plus and op_lt inlining: @ 27 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 27 java.lang.invoke.MethodHandle::invokeExact (17 bytes) @ 10 org.jruby.RubyFixnum::op_plus (38 bytes) @ 45 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 45 java.lang.invoke.MethodHandle::invokeExact (17 bytes) @ 10 org.jruby.RubyFixnum::op_lt (22 bytes) Is it lying, or what? And if it's actually inlining, where's the rest of op_plus and op_lt, most of which is trivial tiny methods? And why doesn't it show up as inlined in the assembly output? - Charlie _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev