[jvm-l] Improving performance of heap-allocated frames

Charles Oliver Nutter Thu, 24 Apr 2008 09:12:14 -0700

Ok, trying this discussion again :)

JRuby will always have to bear the burden of heap-allocated frames. We 
can do various tricks to eliminate them in cases where they're not 
needed (usually through static inspection, but sometimes through dynamic 
profiling), but there are cases where they'll always exist. In essence, 
these cases can be narrowed down to a few specific categories:


- Methods that can access the caller's scope or frame. This includes 
things like eval, methods that access $~ and $_ "local globals", and 
'binding', which captures the caller's execution binding.
- Methods that can manipulate the caller's scope or frame. This includes 
the aforementioned eval, visibility methods (public/private/protected 
are methods in Ruby), and any methods that modify $~ or $_.
- Closures, which both execute within their containing method's frame 
and have access to their containing frame's local variables.

We already reduce the cost of frames for closure dispatch by splitting 
the concept of a variable scope out of the frame and only allocating new 
variable scopes for closures (they share a frame with their containing 
method). But the frame cost still exists for many methods, and until we 
get better optimizations in place it exists for *most* methods.

We also reduce framing cost by pre-allocating the frame stack. So the 
problem with framing, for us, is the cost of initializing a frame's 
fields for every call:

     public void updateFrame(
             RubyModule klazz, IRubyObject self, String name,
             Block block, String fileName, int line,
             JumpTarget jumpTarget) {
         assert block != null :
             "Block uses null object pattern.  It should NEVER be null";

         this.self = self;
         this.name = name;
         this.klazz = klazz;
         this.fileName = fileName;
         this.line = line;
         this.block = block;
         this.jumpTarget = jumpTarget;
         this.visibility = Visibility.PUBLIC;
         this.isBindingFrame = false;
         this.backref = null;
         this.lastline = null;
     }

For a bare method def foo; end; with no arguments, the cost of framing 
is substantial:

~/NetBeansProjects/jruby ➔ jruby -J-server 
test/bench/language/bench_method_dispatch_only.rb
Test ruby method: 100k loops calling self's foo 100 times
   1.590000   0.000000   1.590000 (  1.590000)
   0.868000   0.000000   0.868000 (  0.868000)
   0.638000   0.000000   0.638000 (  0.638000)
   0.624000   0.000000   0.624000 (  0.623000)
   0.597000   0.000000   0.597000 (  0.597000)
   0.593000   0.000000   0.593000 (  0.593000)
   0.593000   0.000000   0.593000 (  0.593000)
   0.597000   0.000000   0.597000 (  0.597000)
   0.593000   0.000000   0.593000 (  0.593000)
   0.604000   0.000000   0.604000 (  0.604000)
~/NetBeansProjects/jruby ➔ jruby -J-server 
-J-Djruby.compile.fastest=true 
test/bench/language/bench_method_dispatch_only.rb
Test ruby method: 100k loops calling self's foo 100 times
   0.740000   0.000000   0.740000 (  0.741000)
   0.278000   0.000000   0.278000 (  0.277000)
   0.174000   0.000000   0.174000 (  0.174000)
   0.152000   0.000000   0.152000 (  0.151000)
   0.162000   0.000000   0.162000 (  0.161000)
   0.147000   0.000000   0.147000 (  0.147000)
   0.142000   0.000000   0.142000 (  0.142000)
   0.131000   0.000000   0.131000 (  0.131000)
   0.139000   0.000000   0.139000 (  0.139000)
   0.155000   0.000000   0.155000 (  0.155000)
~/NetBeansProjects/jruby ➔ cat 
test/bench/language/bench_method_dispatch_only.rb
require 'benchmark'

def foo
   self
end

def invoking
   a = [];
   i = 0;
   while i < 100000
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
     i += 1;
   end
end

puts "Test ruby method: 100k loops calling self's foo 100 times"
10.times {
   puts Benchmark.measure {
     invoking
   }
}

(jruby.compile.fastest optimistically eliminates frames where it 
believes it's safe. It's not always correct yet). So obviously we want 
to try to eliminate or reduce this cost as much as possible. I have 
confirmed that reducing the number of fields updated does improve 
performance, and it pains me to see the extent of that improvement. If, 
for example, I just remove two fields from the frame initialization:

~/NetBeansProjects/jruby ➔ jruby -J-server 
test/bench/language/bench_method_dispatch_only.rb
Test ruby method: 100k loops calling self's foo 100 times
   1.603000   0.000000   1.603000 (  1.602000)
   0.910000   0.000000   0.910000 (  0.910000)
   0.626000   0.000000   0.626000 (  0.626000)
   0.588000   0.000000   0.588000 (  0.588000)
   0.581000   0.000000   0.581000 (  0.580000)
   0.577000   0.000000   0.577000 (  0.577000)
   0.578000   0.000000   0.578000 (  0.578000)
   0.571000   0.000000   0.571000 (  0.571000)
   0.575000   0.000000   0.575000 (  0.575000)
   0.582000   0.000000   0.582000 (  0.582000)

If I pull out more, performance incrementally improves. The final 
improvement is to avoid both the frame update and the stack handling 
logic, which brings it to the fastest time above.

Any thoughts on ways to improve this? Would a description of the stack 
be useful here? Is there anything we should be doing to the Frame itself 
to make it faster and dumber?

- Charlie

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---

[jvm-l] Improving performance of heap-allocated frames

Reply via email to