I have found an interesting clue to poor startup!

It appears that DefaultRubyParser.yyparse (the main parser method) is
*still* never being jitted by the JVM, in either client or server
modes (at least on OS X Java, which should be pretty normal).

Here's the output of PrintCompilation | grep yyparse when running
bench/bench_load.rb:

~/projects/jruby ➔ jruby -J-XX:+PrintCompilation bench/bench_load.rb |
grep yyparse

There's no typo there...it's simply not jitting. I haven't dug into
the deeper Hotspot flags to see why it's not jitting, but it's
definitely not jitting.

And the performance is pretty dismal...here's MRI's result followed by
JRuby on Java 5/6 client/server. I'm only including the first result,
since it's the cold perf we're interested in:

(bear with me, there's a nice payoff in the end)

~/projects/jruby ➔ ruby bench_load.rb
                                         user     system      total        real
 1K load 'fileutils-like'            2.930000   0.090000   3.020000 (  3.039627)
 1K load 'rational'                  0.900000   0.060000   0.960000 (  0.960956)

~/projects/jruby ➔ ruby1.9 bench_load.rb
                                         user     system      total        real
 1K load 'fileutils-like'            4.670000   0.110000   4.780000 (  4.790452)
 1K load 'rational'                  0.090000   0.030000   0.120000 (  0.135478)

~/projects/jruby ➔ jruby -v bench_load.rb
jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java
HotSpot(TM) Client VM 1.5.0_19) [i386-java]
                                         user     system      total        real
 1K load 'fileutils-like'           23.268000   0.000000  23.268000 ( 23.243000)
 1K load 'rational'                  6.601000   0.000000   6.601000 (  6.601000)

~/projects/jruby ➔ jruby --server -v bench_load.rb
jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java
HotSpot(TM) Server VM 1.5.0_19) [i386-java]
                                         user     system      total        real
 1K load 'fileutils-like'           16.808000   0.000000  16.808000 ( 16.734000)
 1K load 'rational'                  5.374000   0.000000   5.374000 (  5.374000)

~/projects/jruby ➔ (pickjdk 3 ; jruby -J-d32 --client -v bench_load.rb
>

~/projects/jruby ➔ (pickjdk 3 ; jruby -J-d32 --client -v bench_load.rb)
New JDK: 1.6.0
jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java
HotSpot(TM) Client VM 1.6.0_17) [i386-java]
                                         user     system      total        real
 1K load 'fileutils-like'           15.589000   0.000000  15.589000 ( 15.567000)
 1K load 'rational'                  4.962000   0.000000   4.962000 (  4.962000)

~/projects/jruby ➔ (pickjdk 3 ; jruby -v bench_load.rb)
New JDK: 1.6.0
jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java
HotSpot(TM) 64-Bit Server VM 1.6.0_17) [x86_64-java]
                                         user     system      total        real
 1K load 'fileutils-like'           16.191000   0.000000  16.191000 ( 16.151000)
 1K load 'rational'                  5.447000   0.000000   5.447000 (  5.447000)

Now, in looking at yyparse, there are not a lot of obvious "outlining"
possibilities in the pre and post-switch code. They're rather involved
and manipulate local variables heavily. A change we made in 1.3 or so
outlined all the case bodies, which definitely helped our performance.
But it wasn't enough to get yyparse compiling...it only allowed the
bodies to compile.

There is, however, one obvious outlining possibility: the switch
itself. I made a modification to outline the switch, passing in the
values necessary for the switch and all the cases. The results are
pretty striking:

~/projects/jruby ➔ jruby -v bench_load.rb
jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java
HotSpot(TM) Client VM 1.5.0_19) [i386-java]
                                         user     system      total        real
 1K load 'fileutils-like'           14.424000   0.000000  14.424000 ( 14.400000)
 1K load 'rational'                  3.781000   0.000000   3.781000 (  3.781000)

~/projects/jruby ➔ jruby --server -v bench_load.rb
jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java
HotSpot(TM) Server VM 1.5.0_19) [i386-java]
                                         user     system      total        real
 1K load 'fileutils-like'            7.594000   0.000000   7.594000 (  7.535000)
 1K load 'rational'                  2.293000   0.000000   2.293000 (  2.293000)

~/projects/jruby ➔ (pickjdk 3 ; jruby -J-d32 --client -v bench_load.rb)
New JDK: 1.6.0
jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java
HotSpot(TM) Client VM 1.6.0_17) [i386-java]
                                         user     system      total        real
 1K load 'fileutils-like'            7.852000   0.000000   7.852000 (  7.829000)
 1K load 'rational'                  2.424000   0.000000   2.424000 (  2.424000)

~/projects/jruby ➔ (pickjdk 3 ; jruby -v bench_load.rb)
New JDK: 1.6.0
jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java
HotSpot(TM) 64-Bit Server VM 1.6.0_17) [x86_64-java]
                                         user     system      total        real
 1K load 'fileutils-like'            7.196000   0.000000   7.196000 (  7.150000)
 1K load 'rational'                  2.335000   0.000000   2.335000 (  2.334000)

You're reading that right...a better than 2x increase in parse
performance for this benchmark. It remains to be seen how much of
normal application startup is purely parser-related, but this can only
improve it.

Tom: Feel like adding one more post-generation parser code
transformation to your build?

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to