As some of you know, I've been busily migrating all method binding to use Java annotations. The main reasons for this is to simplify binding and to provide end-to-end metadata that can be used for optimizing methods. It has enabled using a single binding generator for 90% of methods in the system (and increasing). And today that has enabled making some impressive perf improvements.

The first step I took today was migrating all annotation-based binding to directly generate unique DynamicMethod subclasses rather than unique Callback subclasses that would then be wrapped in a generic DynamicMethod implementation. This moves generated code closer to the actual calls.

The second step was to completely disable STI dispatch.

Of course fib numbers are indicative of only a very narrow range of performance, but I think they're a good indicator of where general performance will go in the future, as we're able to expand these optimizations to a wider range of methods.

JRuby before the changes:

~/NetBeansProjects/jruby $ jruby -J-server -O test/bench/bench_fib_recursive.rb
  1.039000   0.000000   1.039000 (  1.039000)
  1.182000   0.000000   1.182000 (  1.182000)
  1.201000   0.000000   1.201000 (  1.201000)
  1.197000   0.000000   1.197000 (  1.197000)
  1.208000   0.000000   1.208000 (  1.208000)
  1.202000   0.000000   1.202000 (  1.202000)
  1.187000   0.000000   1.187000 (  1.187000)
  1.188000   0.000000   1.188000 (  1.188000)

JRuby after:

~/NetBeansProjects/jruby $ jruby -J-server -O test/bench/bench_fib_recursive.rb
  0.864000   0.000000   0.864000 (  0.863000)
  0.640000   0.000000   0.640000 (  0.640000)
  0.637000   0.000000   0.637000 (  0.637000)
  0.637000   0.000000   0.637000 (  0.637000)
  0.642000   0.000000   0.642000 (  0.642000)
  0.643000   0.000000   0.643000 (  0.643000)
  0.652000   0.000000   0.652000 (  0.652000)
  0.637000   0.000000   0.637000 (  0.637000)

This is probably the largest performance boost since the early days of the compiler, and it's by far the fastest fib has ever run. Here's MRI and YARV's numbers for comparison

MRI:

~/NetBeansProjects/jruby $ ruby test/bench/bench_fib_recursive.rb
  1.760000   0.010000   1.770000 (  1.813867)
  1.750000   0.010000   1.760000 (  1.827066)
  1.760000   0.000000   1.760000 (  1.796172)
  1.760000   0.010000   1.770000 (  1.822739)
  1.740000   0.000000   1.740000 (  1.800645)
  1.750000   0.010000   1.760000 (  1.751270)
  1.750000   0.000000   1.750000 (  1.778388)
  1.740000   0.000000   1.740000 (  1.755024)

And YARV:

~/NetBeansProjects/ruby1.9 $ ./ruby -I lib ../jruby/test/bench/bench_fib_recursive.rb
  0.390000   0.000000   0.390000 (  0.398399)
  0.390000   0.000000   0.390000 (  0.412120)
  0.400000   0.010000   0.410000 (  0.424013)
  0.400000   0.000000   0.400000 (  0.415217)
  0.400000   0.000000   0.400000 (  0.409039)
  0.390000   0.000000   0.390000 (  0.415853)
  0.400000   0.000000   0.400000 (  0.415201)
  0.400000   0.000000   0.400000 (  0.504051)

What I think is really awesome is that I'm comfortable showing YARV's numbers, since we're getting so close--and YARV has a bunch of integer math optimizations we thought we'd never be able to compete with. Well, I guess we can.

However a more reasonable benchmark is the "pentomino" benchmark in the YARV suite. We've always been slower...even much slower some time ago when nothing compiled. Here's JRuby before the changes:

~/NetBeansProjects/jruby $ time jruby -J-server -O test/bench/yarv/bm_app_pentomino.rb

real    1m50.463s
user    1m49.990s
sys     0m1.131s

And after:

~/NetBeansProjects/jruby $ time jruby -J-server -O test/bench/yarv/bm_app_pentomino.rb

real    1m25.906s
user    1m26.393s
sys     0m0.946s

MRI:

~/NetBeansProjects/jruby $ time ruby test/bench/yarv/bm_app_pentomino.rb

real    1m47.635s
user    1m47.287s
sys     0m0.138s

And YARV:

~/NetBeansProjects/ruby1.9 $ time ./ruby -I lib ../jruby/test/bench/yarv/bm_app_pentomino.rb

real    0m49.733s
user    0m49.543s
sys     0m0.104s

Again, keep in mind that YARV is optimized around these benchmarks, so it's not surprising it would still be faster. But with these recent changes--general-purpose changes that are not targeted at any specific benchmark--we're now less than 2x slower.

My confidence has been wholly restored.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list please visit:

   http://xircles.codehaus.org/manage_email

Reply via email to