Subbu: I figured I'd move IR discussions to the dev list, so others
can jump in as needed. Reply to the list...

I had a few more thoughts on missing IR we probably want to
incorporate somehow: framing and scoping stuff.

Obviously the IR already captures whether a closure accesses its own
variables or captured variables, right? But I think where we'd get
even more benefit is by having the IR also include information about
heap-based structures we currently allocate outside of the compiled
code.

So for example, if we know we're doing a call that is likely to access
the caller's frame, like "public" or "private", the IR could also
include information about preparing a frame or ensuring a frame has
already been prepared. This could allow us to lazily stand up those
structures only when needed, and potentially only stand up the parts
we really want (like preparing a frame-local "visibility" slot on some
threadlocal that could then be used by the subsequent calls).

The largest areas where we lose execution performance are as follows:

1. Boxed numeric overhead
2. Complexities of call protocol, like argument list boxing
3. Heap-based call structures like frames and scopes

The first area we are already thinking about addressing in the new IR.
We'll propagate types as much as possible, make assumptions (or
install guards) for numeric methods, and use profiled type information
to specialize code paths. That's all fairly straightforward. We'll
also be able to start taking advantage of escape analysis in recent
Java 6 releases and in openjdk7 builds. When coupled with call
protocol simplifications, we should be able to use all this to improve
numeric performance.

The second area is going to require a more general-purpose
code-generation utility. All method objects in JRuby's method tables
are some subclass of DynamicMethod. Right now we generate "method
handles" called "Invokers" for all core class methods. This amounts to
hundreds of tiny subclasses of DynamicMethod that provide
arity-specific call paths and a unique, inlinable sequence of code. At
runtime, when a method is jitted, we generate it as a blob of code in
its own class in its own classloader, and that is wrapped with a
JittedMethod object. Jitting also triggers the invalidation token on a
class to be "flipped", and the caching logic knows to cache
JittedMethod instead of the containing "DefaultMethod" where the
original interpreted code lives. For AOT compiled code, we generate
Invokers at runtime that then directly dispatch to the blobs of
compiled Ruby.

This all involves a lot of code, and while too much of it is not
generated, what we do generate is too large (well over 1000 invokers
for all core class methods, for example). I believe we need to improve
this protocol, ideally making it possible to *statically* bind some
calls when we can determine exact object types early on. We also have
a potential need to allow Object to pass through our call protocols as
easily as IRubyObject, which makes it even more imperative that we
simplify and generate as much of that code as possible.

Thinking about the whole system makes me realize we've got a ton of
room for improving performance.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to