Ok, trying this discussion again :)
JRuby will always have to bear the burden of heap-allocated frames. We
can do various tricks to eliminate them in cases where they're not
needed (usually through static inspection, but sometimes through dynamic
profiling), but there are cases where they'll always exist. In essence,
these cases can be narrowed down to a few specific categories:
- Methods that can access the caller's scope or frame. This includes
things like eval, methods that access $~ and $_ "local globals", and
'binding', which captures the caller's execution binding.
- Methods that can manipulate the caller's scope or frame. This includes
the aforementioned eval, visibility methods (public/private/protected
are methods in Ruby), and any methods that modify $~ or $_.
- Closures, which both execute within their containing method's frame
and have access to their containing frame's local variables.
We already reduce the cost of frames for closure dispatch by splitting
the concept of a variable scope out of the frame and only allocating new
variable scopes for closures (they share a frame with their containing
method). But the frame cost still exists for many methods, and until we
get better optimizations in place it exists for *most* methods.
We also reduce framing cost by pre-allocating the frame stack. So the
problem with framing, for us, is the cost of initializing a frame's
fields for every call:
public void updateFrame(
RubyModule klazz, IRubyObject self, String name,
Block block, String fileName, int line,
JumpTarget jumpTarget) {
assert block != null :
"Block uses null object pattern. It should NEVER be null";
this.self = self;
this.name = name;
this.klazz = klazz;
this.fileName = fileName;
this.line = line;
this.block = block;
this.jumpTarget = jumpTarget;
this.visibility = Visibility.PUBLIC;
this.isBindingFrame = false;
this.backref = null;
this.lastline = null;
}
For a bare method def foo; end; with no arguments, the cost of framing
is substantial:
~/NetBeansProjects/jruby ➔ jruby -J-server
test/bench/language/bench_method_dispatch_only.rb
Test ruby method: 100k loops calling self's foo 100 times
1.590000 0.000000 1.590000 ( 1.590000)
0.868000 0.000000 0.868000 ( 0.868000)
0.638000 0.000000 0.638000 ( 0.638000)
0.624000 0.000000 0.624000 ( 0.623000)
0.597000 0.000000 0.597000 ( 0.597000)
0.593000 0.000000 0.593000 ( 0.593000)
0.593000 0.000000 0.593000 ( 0.593000)
0.597000 0.000000 0.597000 ( 0.597000)
0.593000 0.000000 0.593000 ( 0.593000)
0.604000 0.000000 0.604000 ( 0.604000)
~/NetBeansProjects/jruby ➔ jruby -J-server
-J-Djruby.compile.fastest=true
test/bench/language/bench_method_dispatch_only.rb
Test ruby method: 100k loops calling self's foo 100 times
0.740000 0.000000 0.740000 ( 0.741000)
0.278000 0.000000 0.278000 ( 0.277000)
0.174000 0.000000 0.174000 ( 0.174000)
0.152000 0.000000 0.152000 ( 0.151000)
0.162000 0.000000 0.162000 ( 0.161000)
0.147000 0.000000 0.147000 ( 0.147000)
0.142000 0.000000 0.142000 ( 0.142000)
0.131000 0.000000 0.131000 ( 0.131000)
0.139000 0.000000 0.139000 ( 0.139000)
0.155000 0.000000 0.155000 ( 0.155000)
~/NetBeansProjects/jruby ➔ cat
test/bench/language/bench_method_dispatch_only.rb
require 'benchmark'
def foo
self
end
def invoking
a = [];
i = 0;
while i < 100000
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
i += 1;
end
end
puts "Test ruby method: 100k loops calling self's foo 100 times"
10.times {
puts Benchmark.measure {
invoking
}
}
(jruby.compile.fastest optimistically eliminates frames where it
believes it's safe. It's not always correct yet). So obviously we want
to try to eliminate or reduce this cost as much as possible. I have
confirmed that reducing the number of fields updated does improve
performance, and it pains me to see the extent of that improvement. If,
for example, I just remove two fields from the frame initialization:
~/NetBeansProjects/jruby ➔ jruby -J-server
test/bench/language/bench_method_dispatch_only.rb
Test ruby method: 100k loops calling self's foo 100 times
1.603000 0.000000 1.603000 ( 1.602000)
0.910000 0.000000 0.910000 ( 0.910000)
0.626000 0.000000 0.626000 ( 0.626000)
0.588000 0.000000 0.588000 ( 0.588000)
0.581000 0.000000 0.581000 ( 0.580000)
0.577000 0.000000 0.577000 ( 0.577000)
0.578000 0.000000 0.578000 ( 0.578000)
0.571000 0.000000 0.571000 ( 0.571000)
0.575000 0.000000 0.575000 ( 0.575000)
0.582000 0.000000 0.582000 ( 0.582000)
If I pull out more, performance incrementally improves. The final
improvement is to avoid both the frame update and the stack handling
logic, which brings it to the fastest time above.
Any thoughts on ways to improve this? Would a description of the stack
be useful here? Is there anything we should be doing to the Frame itself
to make it faster and dumber?
- Charlie
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM
Languages" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---