On Apr 27, 2011, at 5:54 AM, Charles Oliver Nutter wrote: > I prepared this for someone else, but I thought folks here might be > interested in it too. > > This gist contains hotspot x86 (32-bit) assembly output for JRuby's > dynopt mode and invokedynamic (on a couple-week-old OS X OpenJDK > build). I haven't spent a lot of time investigating.
I took a look at it. I used 64-bit x86 since the code is a bit smaller than with 32-bit. The code is almost identical but three things popped into my eye (the output is from PrintOptoAssembly): 1. The obvious one: the method handle call site guard: 1a4 B32: # B160 B33 <- B31 B149 B123 Freq: 0.499969 1a4 movq R10, byte[int:>=0]<ciObject ident=770 PERM address=0xe99088> * # ptr 1ae movq R10, [R10 + #1576 (32-bit)] # ptr 1b5 movq R11, [R10 + #32 (8-bit)] # ptr 1b9 movq R8, java/lang/invoke/AdapterMethodHandle:exact * # ptr 1c3 cmpq R11, R8 # ptr 1c6 jne,u B160 P=0.000000 C=-1.000000 2. The dynopt version only has one class check while the indy version has two (before and after the recursive call site). This could be because of basic block layout but I'm curious why it's laid out differently: dynopt: ------- <recursive call site> 209 B37: # B142 B38 <- B102 B36 Freq: 0.499974 209 movq R10, [rsp + #16] # spill 20e movq R11, precise klass org/jruby/RubyObject: 0x0000000000f4cf88:Constant:exact * # ptr 218 cmpq R10, R11 # ptr 21b jne,u B142 P=0.000000 C=-1.000000 21b 221 B38: # B161 B39 <- B37 Freq: 0.499974 221 movq R10, [rsp + #64] # spill 226 # checkcastPP of R10 226 movq R10, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/RubyBasicObject.metaClass 22a movl R10, [R10 + #44 (8-bit)] # int ! Field org/jruby/RubyModule.generation 22e NullCheck R10 22e 22e B39: # B107 B40 <- B38 Freq: 0.499974 22e cmpl R10, #632 235 jne B107 P=0.000000 C=563147.000000 235 23b B40: # B162 B41 <- B39 Freq: 0.499973 23b movq R9, [rsp + #0] # spill 23f movq R10, [R9 + #16 (8-bit)] # ptr ! Field org/jruby/ast/executable/AbstractScript.runtimeCache 243 movq RBP, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/ast/executable/RuntimeCache.callSites 247 NullCheck R10 indy: ----- 1cc B33: # B174 B34 <- B32 Freq: 0.499969 1cc movq R10, [rsp + #80] # spill 1d1 movq R10, [R10 + #8 (8-bit)] # class 1d5 NullCheck R10 1d5 1d5 B34: # B114 B35 <- B33 Freq: 0.499969 1d5 movq R10, [R10 + #64 (8-bit)] # class 1d9 movq R11, precise klass org/jruby/RubyBasicObject: 0x00000000011f5478:Constant:exact * # ptr 1e3 cmpq R10, R11 # ptr 1e6 jne,u B114 P=0.000001 C=-1.000000 1e6 1ec B35: # B175 B36 <- B34 Freq: 0.499968 1ec movq R10, [rsp + #80] # spill 1f1 # checkcastPP of R10 1f1 movq R10, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/RubyBasicObject.metaClass 1f5 movl R11, [R10 + #44 (8-bit)] # int ! Field org/jruby/RubyModule.generation 1f9 NullCheck R10 1f9 1f9 B36: # B124 B37 <- B35 Freq: 0.499968 1f9 cmpl R11, #632 200 jne B124 P=0.000000 C=209925.000000200 <recursive call site> 237 B40: # B86 B41 <- B39 Freq: 0.499957 237 movq R10, [RSI + #40 (8-bit)] # class 23b movq R11, precise klass org/jruby/runtime/builtin/IRubyObject: 0x00000000011ce468:Constant:exact * # ptr 245 cmpq R10, R11 # ptr 248 jne,u B86 P=0.170000 C=-1.000000 248 24e B41: # B42 <- B40 B86 Freq: 0.499957 24e # checkcastPP of RBP 24e movq [rsp + #96], RBP # spill 24e 253 B42: # B177 B43 <- B41 B161 Freq: 0.499957 253 movq R10, [rsp + #24] # spill 258 movq R10, [R10 + #16 (8-bit)] # ptr ! Field org/jruby/ast/executable/AbstractScript.runtimeCache 25c movq RBP, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/ast/executable/RuntimeCache.callSites 260 NullCheck R10 3. The dynopt version has two occurrences of the following block in the hot code path, while the indy version has three of them (this could also be because of code layout as 2.): 1ec B35: # B175 B36 <- B34 Freq: 0.499968 1ec movq R10, [rsp + #80] # spill 1f1 # checkcastPP of R10 1f1 movq R10, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/RubyBasicObject.metaClass 1f5 movl R11, [R10 + #44 (8-bit)] # int ! Field org/jruby/RubyModule.generation 1f9 NullCheck R10 -- Christian > > https://gist.github.com/943357 > > One thing I did notice is that MaxRecursiveInlineLevel appears to be 1 > by default normally. I played with bumping it up but performance > degraded no matter what combination of flags I used. > > A related question: what would it take to get the hsdis plugin > included with openjdk proper all the time? It would be nice if > PrintAssembly worked out of the box on all Java 7 builds. > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev@openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev