I have found an interesting clue to poor startup! It appears that DefaultRubyParser.yyparse (the main parser method) is *still* never being jitted by the JVM, in either client or server modes (at least on OS X Java, which should be pretty normal).
Here's the output of PrintCompilation | grep yyparse when running bench/bench_load.rb: ~/projects/jruby ➔ jruby -J-XX:+PrintCompilation bench/bench_load.rb | grep yyparse There's no typo there...it's simply not jitting. I haven't dug into the deeper Hotspot flags to see why it's not jitting, but it's definitely not jitting. And the performance is pretty dismal...here's MRI's result followed by JRuby on Java 5/6 client/server. I'm only including the first result, since it's the cold perf we're interested in: (bear with me, there's a nice payoff in the end) ~/projects/jruby ➔ ruby bench_load.rb user system total real 1K load 'fileutils-like' 2.930000 0.090000 3.020000 ( 3.039627) 1K load 'rational' 0.900000 0.060000 0.960000 ( 0.960956) ~/projects/jruby ➔ ruby1.9 bench_load.rb user system total real 1K load 'fileutils-like' 4.670000 0.110000 4.780000 ( 4.790452) 1K load 'rational' 0.090000 0.030000 0.120000 ( 0.135478) ~/projects/jruby ➔ jruby -v bench_load.rb jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java HotSpot(TM) Client VM 1.5.0_19) [i386-java] user system total real 1K load 'fileutils-like' 23.268000 0.000000 23.268000 ( 23.243000) 1K load 'rational' 6.601000 0.000000 6.601000 ( 6.601000) ~/projects/jruby ➔ jruby --server -v bench_load.rb jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java HotSpot(TM) Server VM 1.5.0_19) [i386-java] user system total real 1K load 'fileutils-like' 16.808000 0.000000 16.808000 ( 16.734000) 1K load 'rational' 5.374000 0.000000 5.374000 ( 5.374000) ~/projects/jruby ➔ (pickjdk 3 ; jruby -J-d32 --client -v bench_load.rb > ~/projects/jruby ➔ (pickjdk 3 ; jruby -J-d32 --client -v bench_load.rb) New JDK: 1.6.0 jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java HotSpot(TM) Client VM 1.6.0_17) [i386-java] user system total real 1K load 'fileutils-like' 15.589000 0.000000 15.589000 ( 15.567000) 1K load 'rational' 4.962000 0.000000 4.962000 ( 4.962000) ~/projects/jruby ➔ (pickjdk 3 ; jruby -v bench_load.rb) New JDK: 1.6.0 jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_17) [x86_64-java] user system total real 1K load 'fileutils-like' 16.191000 0.000000 16.191000 ( 16.151000) 1K load 'rational' 5.447000 0.000000 5.447000 ( 5.447000) Now, in looking at yyparse, there are not a lot of obvious "outlining" possibilities in the pre and post-switch code. They're rather involved and manipulate local variables heavily. A change we made in 1.3 or so outlined all the case bodies, which definitely helped our performance. But it wasn't enough to get yyparse compiling...it only allowed the bodies to compile. There is, however, one obvious outlining possibility: the switch itself. I made a modification to outline the switch, passing in the values necessary for the switch and all the cases. The results are pretty striking: ~/projects/jruby ➔ jruby -v bench_load.rb jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java HotSpot(TM) Client VM 1.5.0_19) [i386-java] user system total real 1K load 'fileutils-like' 14.424000 0.000000 14.424000 ( 14.400000) 1K load 'rational' 3.781000 0.000000 3.781000 ( 3.781000) ~/projects/jruby ➔ jruby --server -v bench_load.rb jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java HotSpot(TM) Server VM 1.5.0_19) [i386-java] user system total real 1K load 'fileutils-like' 7.594000 0.000000 7.594000 ( 7.535000) 1K load 'rational' 2.293000 0.000000 2.293000 ( 2.293000) ~/projects/jruby ➔ (pickjdk 3 ; jruby -J-d32 --client -v bench_load.rb) New JDK: 1.6.0 jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java HotSpot(TM) Client VM 1.6.0_17) [i386-java] user system total real 1K load 'fileutils-like' 7.852000 0.000000 7.852000 ( 7.829000) 1K load 'rational' 2.424000 0.000000 2.424000 ( 2.424000) ~/projects/jruby ➔ (pickjdk 3 ; jruby -v bench_load.rb) New JDK: 1.6.0 jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2010-01-16 984bcf1) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_17) [x86_64-java] user system total real 1K load 'fileutils-like' 7.196000 0.000000 7.196000 ( 7.150000) 1K load 'rational' 2.335000 0.000000 2.335000 ( 2.334000) You're reading that right...a better than 2x increase in parse performance for this benchmark. It remains to be seen how much of normal application startup is purely parser-related, but this can only improve it. Tom: Feel like adding one more post-generation parser code transformation to your build? - Charlie --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email