On Thu, Apr 28, 2011 at 5:16 AM, Christian Thalinger <christian.thalin...@oracle.com> wrote: > I took a look at it. I used 64-bit x86 since the code is a bit smaller than > with 32-bit. > > The code is almost identical but three things popped into my eye (the output > is from PrintOptoAssembly): > > 1. The obvious one: the method handle call site guard: > > 1a4 B32: # B160 B33 <- B31 B149 B123 Freq: 0.499969 > 1a4 movq R10, byte[int:>=0]<ciObject ident=770 PERM address=0xe99088> > * # ptr > 1ae movq R10, [R10 + #1576 (32-bit)] # ptr > 1b5 movq R11, [R10 + #32 (8-bit)] # ptr > 1b9 movq R8, java/lang/invoke/AdapterMethodHandle:exact * # ptr > 1c3 cmpq R11, R8 # ptr > 1c6 jne,u B160 P=0.000000 C=-1.000000
I saw in your other email that eliminating this puts indy on par with dynopt, which is spectacular news. Can you elaborate on how that would be possible to do "correctly" (as in not via a hack)? Would it be a lighter-weight check and deopt of some kind (in Hotspot), or is it something I'd need to rig up on my code? > 2. The dynopt version only has one class check while the indy version has two > (before and after the recursive call site). This could be because of basic > block layout but I'm curious why it's laid out differently: ... > indy: > ----- > > 1cc B33: # B174 B34 <- B32 Freq: 0.499969 > 1cc movq R10, [rsp + #80] # spill > 1d1 movq R10, [R10 + #8 (8-bit)] # class > 1d5 NullCheck R10 > 1d5 > 1d5 B34: # B114 B35 <- B33 Freq: 0.499969 > 1d5 movq R10, [R10 + #64 (8-bit)] # class > 1d9 movq R11, precise klass org/jruby/RubyBasicObject: > 0x00000000011f5478:Constant:exact * # ptr > 1e3 cmpq R10, R11 # ptr > 1e6 jne,u B114 P=0.000001 C=-1.000000 > 1e6 > 1ec B35: # B175 B36 <- B34 Freq: 0.499968 > 1ec movq R10, [rsp + #80] # spill > 1f1 # checkcastPP of R10 > 1f1 movq R10, [R10 + #24 (8-bit)] # ptr ! Field > org/jruby/RubyBasicObject.metaClass > 1f5 movl R11, [R10 + #44 (8-bit)] # int ! Field > org/jruby/RubyModule.generation > 1f9 NullCheck R10 > 1f9 > 1f9 B36: # B124 B37 <- B35 Freq: 0.499968 > 1f9 cmpl R11, #632 > 200 jne B124 P=0.000000 C=209925.000000200 I'll have to read through the PrintAssembly output to see if both guards are being traversed on the fast path. Hopefully they're not...I assume we'd see more degradation in the indy case if that were happening, though. I've been trying to think of ways to reduce the guard cost, since the perf without the JRuby guard is a fair bit better (0.79 versus 0.63s for fib(35)). The performance without guards is actually faster than any other Ruby implementation I've yet run. One idea: call site => SwitchPoint invalidated if Fixnum is reopened (rare) => GWT guarded on exact object type RubyFixnum => RubyFixnum method This would avoid traversing the metaclass and generation fields and doing the generation compare. This approach could also work for all core JRuby classes. Basically, where subclasses of Array are currently backed by the same RubyArray object, I would introduce a RubyArraySubclass object for that purpose. That would guarantee that only regular Array objects are RubyArray, allowing me to reduce any invocations against Array to a switchpoint + type check. A question: what would be the best way currently to emit the cheapest possible type guard? There's currently no "instanceof" adapter that can do that type check for me, so I'd be reduced to something like a Class equality check. Basically I'm looking for the right way to emit an exact type check that will optimize to the equivalent check Hotspot does for virtual method invocations. Help? - Charlie _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev