This note interweaves code optimization with discussion about IR because the IR choice will be influenced by what you want to do with it. I am going to use examples below to guide this discussion. A lot of it is somewhat 'stream-of-thought' writing -- this is not a tight spec.
Since we let HotSpot do a lot of heavy lifting, the goal of the IR and associated transformations would be to transform the code by exploting available type information, inlining where necessary, method caching, etc. Example 1: --------- def add(a, b) a + b end I'll use this example first to discuss what the IR needs to enable. Even this simple example proves impressively complex in the face of Ruby's dynamic nature that lets us explore different aspects of the optimization problem. Straightforward transformation to IR ------------------------------------ Instructions are basically multi-operand format and most instructions compute a result which can be stored into a variable. Attempt 1: a = meth_arg(0) b = meth_arg(1) args = [b] ret = ocall a, '+', args return ret This IR transformation is straightforward and no surprises here. The ocall is basically a method call that does method lookup prior to the call. Alternatively, by splitting a call into a method-address computation and the actual call, we might be able to eliminate repeated method-table lookups (or whatever else is involved in identifying a method address). Attempt 2: a = meth_arg(0) b = meth_arg(1) m = meth_addr a, '+' args = [b] ret = call m, args return ret Here 'call' could be a regular C call for all you care. So, what would happen in the compiler is that you would first transform the code into a higher-level IR (with virtual calls as above), and at some point in the optimization, you will ratchet down the IR one level lower that might expose other optimizations such as this one. If you took care to maintain the SSA-ness during transformations, optimizations such as eliminating duplicate method look ups can be done by the SCCP (sparse conditional constant propagation) algorithm which is linear and pretty fast for all practical purposes. Open classes & JIT => All optimizations have to be guarded ---------------------------------------------------------- Unlike Java, in Ruby, because of open classes, code can be modified at any time and the code changes can have sweeping effect across the application. For example, overriding the '+' in Fixnum is likely to affect lots of code. So, whereas in Java, once you optimize code, that code is gold (except for inheritance hierarchy changes -- which Hotspot catches anyway), in Ruby, all optimizations need to guard against code modifications at all times. If you live entirely in JRuby land, you can always back out optimizations. But, if you are going to JIT Ruby code and enter JVM/HotSpot land, you have to build in optimization guards throughout the code. This means the IR has to incorporate guard instructions. For example, here are few types of guards: type_guard(o, Fixnum, L) <-- if type of o is not Fixnum, jump to L class_guard(Fixnum, L) <-- if Fixnum has been monkey-patched, jump to L method_guard(Fixnum, '+', L) <-- if Fixnum.+ has been monkey-patched, jump to L With this in mind, assuming (however you get this information) you expect Fixnums to dominate this call, you might compile the example above into: ---------- a = meth_arg(0) b = meth_arg(1) type_guard(a, Fixnum, L1) type_guard(b, Fixnum, L1) method_guard(Fixnum, '+', L1) m = meth_addr Fixnum, 'value' # Assume there is a static method somewhere args = [a] v1 = call, m, args m = meth_addr Fixnum, 'value' # Assume there is a static method somewhere args = [b] v2 = call, m, args v3 = add v1, v2 # This is a standard ALU operation (iadd bytecode) m = meth_addr Fixnum, 'new' # Assume there is a static method somewhere args = [v3] ret = call m, args jump L2 L1: m = meth_addr a, '+' args = [b] ret = call m, args L2: return ret ---------- Clearly, there is a lot of overhead unboxing the objects -- but that is because the objects are being passed in as arguments. But, you can see that if this expression had one or both literals, the overhead drops. If both were literals, the code reduces to boxing the result into a Fixnum. But, if not, HotSpot might be able to inline the 'value' calls at this site and get rid of the unboxing overhead. JRuby Inlining -------------- Since HotSpot won't really be able to optimize the unboxing and add type-based guards as above, at some point, you might have to embrace method inlining. For inlining, you will use the high-level IR. Consider this call site (Ruby): ... n = add(3,5) ... This translates to (IR): ... args = [3,5] n = call 'add', args ... After inlining (ignoring any required inlining guards for now), this changes to: args = [3,5] a = args[0] b = args[1] args = [b] ret = ocall a, '+', args n = ret When you convert to SSA and run SCCP (Sparse Conditional Constant Propagation) on this, you get: args = [5] ret = ocall 3, '+', args After introducing type guards and re-running SCCP, you get: ... method_guard(Fixnum, '+', L1) v = 8 ... code to box v into a Fixnum ... n = v jump L2 L1: args = [5] n = ocall 3, '+', args L2: ... Clearly, this is still simplistic because you might be boxing the 8 into a Fixnum prematurely, and it might never be needed. So, in order to be able to optimize boxing and unboxing, it might be useful to introduce IR instructions that encapsulate boxing and unboxing of primitive types. Since the target is the JVM, you will introduce boxing/unboxing IR instructions for JVM primitives. Fixnum and Array are obvious candidates. So, box and unbox IR instructions could be: n = box_primitive(Fixnum, v) v = unbox_primitive(Fixnum, n) With this, the inlined and optimized code above would become: ... method_guard(Fixnum, '+', L1) n = box_primitive(Fixnum, 8) jump L2 L1: args = [5] n = ocall 3, '+', args L2: ... The next problem is to propagate type information throughout a method to allow optimizations to be as global as possible. In the optimization analyses, you have the option of carrying around the state implicitly (carried along in hashtables while running analyses), or explicitly (by introducing attribute instructions in the IR). The implicit approach is easy and preferable for cases where information about a variable is not use-site specific (constant propagation, for example). But for cases where this is not true, the explicit approach is preferable. For example, the type of an object along two paths of a branch would be different. So, a variable 'v' (in SSA form) might have type 'Fixnum' along one path and 'String' along another. I prefer the explicit approach: - they lend themselves to easier debugging because when you dump the IR, you can see what information is being generated. - you convert use-site specific information to attributes that are no longer use-site specific (because every branched path gets its own SSA attribute). - by converting site-use specific information to constants, you can incorporate transformations as part of regular constant propagation via SCCP. I have used this approach to good benefit in my JVM compiler I wrote for my dissertation -- I used this approach to eliminate null checks and array bounds checks. So, with this in mind, you will now augment all IR instructions to include an array of attributes. So, the generic IR form will now be: v = OP(operands, attributes) When you run SCCP on this, you also run SCCP on the attribute values. This lets you use attributes as additional meta-information to optimize that instruction. For example, consider this sequence of IR instructions: v = 5 a = box_primitive(Fixnum, v, []) a_attr = [BOXED_FIXNUM(v)] ... ... a is not used anywhere here ... ... b = a b_attr = a_attr x = unbox_primitive(Fixnum, b, [b_attr]) y = x When you run SCCP on this, here is what happens to the unbox_primitive instruction: x = unbox_primitive(Fixnum, b, [[BOXED_FIXNUM(v)]]) Thus, you know that you are unboxing a boxed value 'v'. So, you can simplify this instruction to: x = v So, when you run SCCP, the above sequence reduces to (I am excluding some useless copies here) v = 5 a = box_primitive(Fixnum, v, []) ... y = 5 After you run dead code elimination, assuming a is not used anywhere, the code simplifies to v = 5 y = 5 Summary: -------- 1. Every IR instruction is basically of the form v = OP(operand_array, attribute_array) Since this is a high-level IR, you are not bound to a 3-operand format. You can have an array of operands and an array of attributes. But, you can also have IR instructions of the form: v = OP(op1, attribute_array) v = OP(op1, op2, attribute_array) 2. You have regular ALU kind of IR instructions (add, sub, cmp, lshift, etc.) 3. You have various flavor of call instructions: v = call(method, args) v = ocall(receiver, method, args) ... and anything else you can think of ... 4. You have a method-address instruction: m = meth_addr(receiver, method) 5. Instructions to extract function/method arguments and return function values: x = meth_arg(arg_index) return y 6. You have various kinds of guard instructions: type_guard(obj, class, L) <-- if type of obj is not class, jump to L class_guard(class, L) <-- if class has been monkey-patched, jump to L method_guard(class, method, L) <-- if class.method has been monkey-patched, jump to L 7. You have box/unbox instructions for primitives: n = box_primitive(class, v) v = unbox_primitive(class, n) 8. You have attribute instructions for passing around meta-information (types, etc.) init_attr(a_attr, []) add_attr(a_attr, X) Note: - a_attr is just naming convention. As far as representation is concerned, it is just another SSA IR variable. - You need to identify the different types of meta-information you might pass around. The X above is a place-holder for that. Some kinds of X are: - BOXED_FIXNUM(n) - BOXED_ARRAY(a) - HAS_TYPE(class) ... This is it for the 1st pass. More later with an example of closures.