I'm seeing something peculiar and wanted to run it by you folks. There are a few values that JRuby's compiler had previously been loading from instance fields every time they're needed. Specifically, fields like ThreadContext.runtime (the current JRuby runtime), Ruby.falseObject, Ruby.trueObject, Ruby.nilObject (false, true, and nil values). I figured I'd make a quick change today and have those instead be constant method handles bound into a mutable call site.
Unfortunately, performance seems to be worse. The logic works like this: * ThreadContext is loaded to stack * invokedynamic, bootstrap just wires up an initialization method into a MutableCallSite * initialization method rebinds call site forever to a constant method handle pointing at the value (runtime/true/false/nil objects) My expectation was that this would be at least no slower (and potentially a tiny bit faster) but also less bytecode (in the case of true/false/nil, it was previously doing ThreadContext.runtime.getNil()/getTrue()/getFalse()). It seems like it's actually slower than walking those references, though, and I'm not sure why. Here's a couple of the scenarios in diff form showing bytecode before and bytecode after: Loading "runtime" ALOAD 1 - GETFIELD org/jruby/runtime/ThreadContext.runtime : Lorg/jruby/Ruby; + INVOKEDYNAMIC getRuntime (Lorg/jruby/runtime/ThreadContext;)Lorg/jruby/Ruby; [org/jruby/runtime/invokedynamic/InvokeDynamicSupport.getObjectBootstrap(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/St ring;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite; (6)] Loading "false" ALOAD 1 - GETFIELD org/jruby/runtime/ThreadContext.runtime : Lorg/jruby/Ruby; - INVOKEVIRTUAL org/jruby/Ruby.getFalse ()Lorg/jruby/RubyBoolean; + INVOKEDYNAMIC getFalse (Lorg/jruby/runtime/ThreadContext;)Lorg/jruby/RubyBoolean; [org/jruby/runtime/invokedynamic/InvokeDynamicSupport.getObjectBootstrap(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite; (6)] I think because these are now seen as invocations, I'm hitting some inlining budget limit I didn't hit before (and which isn't being properly discounted). The benchmark I'm seeing degrade is bench/language/bench_flip.rb, and it's a pretty significant degradation. Only the "heap" version shows the degradation, and it definitely does have more bytecode...but the bytecode with my patch differs only in the way these values are being accessed, as shown in the diffs above. Before: user system total real 1m x10 while (a)..(!a) (heap) 0.951000 0.000000 0.951000 ( 0.910000) user system total real 1m x10 while (a)..(!a) (heap) 0.705000 0.000000 0.705000 ( 0.705000) user system total real 1m x10 while (a)..(!a) (heap) 0.688000 0.000000 0.688000 ( 0.688000) user system total real After: user system total real 1m x10 while (a)..(!a) (heap) 2.350000 0.000000 2.350000 ( 2.284000) user system total real 1m x10 while (a)..(!a) (heap) 2.128000 0.000000 2.128000 ( 2.128000) user system total real 1m x10 while (a)..(!a) (heap) 2.115000 0.000000 2.115000 ( 2.116000) user system total real You can see the degradation is pretty bad. I'm concerned because I had hoped that invokedynamic + mutable call site + constant handle would always be faster than a field access...since it avoids excessive field accesses and makes it possible for Hotspot to fold those constants away. What's going on here? Patch for the change (apply to JRuby master) is here: https://gist.github.com/955976b52b0c4e3f611e - Charlie _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev