Obviously the IR already captures whether a closure accesses its own > variables or captured variables, right?
Implicitly yes. This information will become explicit after reaching defs / live variable analysis is done to know whether the variables being access live entirely within the closure body or come from outside it. At that time, we can make this info explicit. Captured variables will have to be stored/loaded from the frame, and these instructions will be made explicit in the IR so that unnecessary loads/stories can be removed. In addition, making this explicit will keep the code generation phase simple. > So for example, if we know we're doing a call that is likely to access > the caller's frame, like "public" or "private", the IR could also > include information about preparing a frame or ensuring a frame has > already been prepared. This could allow us to lazily stand up those > structures only when needed, and potentially only stand up the parts > we really want (like preparing a frame-local "visibility" slot on some > threadlocal that could then be used by the subsequent calls). Makes sense. By frame, are you referring to the standard stack call frame, or is it some other heap structure specific to the implementation? I presume the latter. The largest areas where we lose execution performance are as follows: > > 1. Boxed numeric overhead > 2. Complexities of call protocol, like argument list boxing > 3. Heap-based call structures like frames and scopes > > The first area we are already thinking about addressing in the new IR. > We'll propagate types as much as possible, make assumptions (or > install guards) for numeric methods, and use profiled type information > to specialize code paths. That's all fairly straightforward. We'll > also be able to start taking advantage of escape analysis in recent > Java 6 releases and in openjdk7 builds. When coupled with call > protocol simplifications, we should be able to use all this to improve > numeric performance. > > The second area is going to require a more general-purpose > code-generation utility. All method objects in JRuby's method tables > are some subclass of DynamicMethod. Right now we generate "method > handles" called "Invokers" for all core class methods. This amounts to > hundreds of tiny subclasses of DynamicMethod that provide > arity-specific call paths and a unique, inlinable sequence of code. At > runtime, when a method is jitted, we generate it as a blob of code in > its own class in its own classloader, and that is wrapped with a > JittedMethod object. Jitting also triggers the invalidation token on a > class to be "flipped", and the caching logic knows to cache > JittedMethod instead of the containing "DefaultMethod" where the > original interpreted code lives. For AOT compiled code, we generate > Invokers at runtime that then directly dispatch to the blobs of > compiled Ruby. > This all involves a lot of code, and while too much of it is not > generated, what we do generate is too large (well over 1000 invokers > for all core class methods, for example). I believe we need to improve > this protocol, ideally making it possible to *statically* bind some > calls when we can determine exact object types early on. We also have > a potential need to allow Object to pass through our call protocols as > easily as IRubyObject, which makes it even more imperative that we > simplify and generate as much of that code as possible. After whatever analyses we choose to perform on the current high level IR code, the high-level call instruction can be converted to a lower level IR where some of these details are made explicit. I need to better understand the current call protocol with all the boxing and wrapping that is involved to comment on this in greater detail. But yes, it should be possible to reduce some of these overheads. For example, you could have different flavors of call instructions depending on whether the call target is statically known or not, whether an inline cache is needed or not. By making explicit method lookups, you can eliminate duplicate method table loads (assuming objects have pointers to their method tables). Consider this: o.m1(..) o.m2(..) Since type of o hasn't changed between the 2 calls, you can skip the method table load for the second call. Anyway, I need to understand the call protocol in greater detail to comment more. Subbu. Thinking about the whole system makes me realize we've got a ton of > room for improving performance. > > - Charlie > > --------------------------------------------------------------------- > To unsubscribe from this list, please visit: > > http://xircles.codehaus.org/manage_email > > >
