This note interweaves code optimization with discussion about IR because
the IR choice will be influenced by what you want to do with it.  I am
going to use examples below to guide this discussion.  A lot of it is
somewhat 'stream-of-thought' writing -- this is not a tight spec.

Since we let HotSpot do a lot of heavy lifting, the goal of the IR and
associated transformations would be to transform the code by exploting
available type information, inlining where necessary, method caching, etc.

Example 1:
---------
   def add(a, b)
     a + b
   end

I'll use this example first to discuss what the IR needs to enable.
Even this simple example proves impressively complex in the face of
Ruby's dynamic nature that lets us explore different aspects of the
optimization problem.

Straightforward transformation to IR
------------------------------------
Instructions are basically multi-operand format and most instructions
compute a result which can be stored into a variable.

Attempt 1:
   a    = meth_arg(0)
   b    = meth_arg(1)
   args = [b]
   ret  = ocall a, '+', args
   return ret

This IR transformation is straightforward and no surprises here.  The
ocall is basically a method call that does method lookup prior to the
call.

Alternatively, by splitting a call into a method-address computation
and the actual call, we might be able to eliminate repeated method-table
lookups (or whatever else is involved in identifying a method address).

Attempt 2:
   a    = meth_arg(0)
   b    = meth_arg(1)
   m    = meth_addr a, '+'
   args = [b]
   ret  = call m, args
   return ret

Here 'call' could be a regular C call for all you care.  So, what would
happen in the compiler is that you would first transform the code into
a higher-level IR (with virtual calls as above), and at some point in the
optimization, you will ratchet down the IR one level lower that might
expose other optimizations such as this one.  If you took care to maintain
the SSA-ness during transformations, optimizations such as eliminating
duplicate method look ups can be done by the SCCP (sparse conditional
constant
propagation) algorithm which is linear and pretty fast for all practical
purposes.

Open classes & JIT => All optimizations have to be guarded
----------------------------------------------------------
Unlike Java, in Ruby, because of open classes, code can be modified at any
time and the code changes can have sweeping effect across the application.
For example, overriding the '+' in Fixnum is likely to affect lots of code.
So, whereas in Java, once you optimize code, that code is gold (except for
inheritance hierarchy changes -- which Hotspot catches anyway), in Ruby, all
optimizations need to guard against code modifications at all times.  If you
live entirely in JRuby land, you can always back out optimizations.

But, if you are going to JIT Ruby code and enter JVM/HotSpot land, you have
to build in optimization guards throughout the code.  This means the IR has
to incorporate guard instructions.  For example, here are few types of
guards:

   type_guard(o, Fixnum, L)     <-- if type of o is not Fixnum, jump to L
   class_guard(Fixnum, L)       <-- if Fixnum has been monkey-patched, jump
to L
   method_guard(Fixnum, '+', L) <-- if Fixnum.+ has been monkey-patched,
jump to L

With this in mind, assuming (however you get this information) you expect
Fixnums
to dominate this call, you might compile the example above into:

----------
   a = meth_arg(0)
   b = meth_arg(1)
   type_guard(a, Fixnum, L1)
   type_guard(b, Fixnum, L1)
   method_guard(Fixnum, '+', L1)

   m    = meth_addr Fixnum, 'value'  # Assume there is a static method
somewhere
   args = [a]
   v1   = call, m, args
   m    = meth_addr Fixnum, 'value'  # Assume there is a static method
somewhere
   args = [b]
   v2   = call, m, args
   v3   = add v1, v2                 # This is a standard ALU operation
(iadd bytecode)
   m    = meth_addr Fixnum, 'new'    # Assume there is a static method
somewhere
   args = [v3]
   ret  = call m, args
   jump L2

L1:
   m    = meth_addr a, '+'
   args = [b]
   ret  = call m, args

L2:
   return ret
----------

Clearly, there is a lot of overhead unboxing the objects -- but that is
because
the objects are being passed in as arguments.  But, you can see that if this
expression had one or both literals, the overhead drops.  If both were
literals,
the code reduces to boxing the result into a Fixnum.  But, if not, HotSpot
might be
able to inline the 'value' calls at this site and get rid of the unboxing
overhead.

JRuby Inlining
--------------
Since HotSpot won't really be able to optimize the unboxing and add
type-based guards
as above, at some point, you might have to embrace method inlining.  For
inlining,
you will use the high-level IR.

Consider this call site (Ruby):

  ...
  n = add(3,5)
  ...

This translates to (IR):

  ...
  args = [3,5]
  n = call 'add', args
  ...

After inlining (ignoring any required inlining guards for now), this changes
to:
   args = [3,5]
   a    = args[0]
   b    = args[1]
   args = [b]
   ret  = ocall a, '+', args
   n    = ret

When you convert to SSA and run SCCP (Sparse Conditional Constant
Propagation) on this, you get:
   args = [5]
   ret = ocall 3, '+', args

After introducing type guards and re-running SCCP, you get:
   ...
   method_guard(Fixnum, '+', L1)
   v = 8
   ... code to box v into a Fixnum ...
   n = v
   jump L2
L1:
   args = [5]
   n = ocall 3, '+', args
L2:
   ...

Clearly, this is still simplistic because you might be boxing the 8 into a
Fixnum
prematurely, and it might never be needed.  So, in order to be able to
optimize
boxing and unboxing, it might be useful to introduce IR instructions that
encapsulate
boxing and unboxing of primitive types.  Since the target is the JVM, you
will
introduce boxing/unboxing IR instructions for JVM primitives.  Fixnum and
Array
are obvious candidates.

So, box and unbox IR instructions could be:
   n = box_primitive(Fixnum, v)
   v = unbox_primitive(Fixnum, n)

With this, the inlined and optimized code above would become:
   ...
   method_guard(Fixnum, '+', L1)
   n = box_primitive(Fixnum, 8)
   jump L2
L1:
   args = [5]
   n = ocall 3, '+', args
L2:
   ...

The next problem is to propagate type information throughout a method to
allow
optimizations to be as global as possible.  In the optimization analyses,
you
have the option of carrying around the state implicitly (carried along in
hashtables
while running analyses), or explicitly (by introducing attribute
instructions in
the IR).  The implicit approach is easy and preferable for cases where
information
about a variable is not use-site specific (constant propagation, for
example).
But for cases where this is not true, the explicit approach is preferable.
For
example, the type of an object along two paths of a branch would be
different.
So, a variable 'v' (in SSA form) might have type 'Fixnum' along one path and
'String' along another.

I prefer the explicit approach:
- they lend themselves to easier debugging because when you dump the IR, you
can
  see what information is being generated.
- you convert use-site specific information to attributes that are no longer
  use-site specific (because every branched path gets its own SSA
attribute).
- by converting site-use specific information to constants, you can
incorporate
  transformations as part of regular constant propagation via SCCP.

I have used this approach to good benefit in my JVM compiler I wrote for my
dissertation -- I used this approach to eliminate null checks and array
bounds checks.

So, with this in mind, you will now augment all IR instructions to include
an
array of attributes.  So, the generic IR form will now be:

    v = OP(operands, attributes)

When you run SCCP on this, you also run SCCP on the attribute values.  This
lets
you use attributes as additional meta-information to optimize that
instruction.

For example, consider this sequence of IR instructions:

   v = 5
   a = box_primitive(Fixnum, v, [])
   a_attr = [BOXED_FIXNUM(v)]
   ...
   ... a is not used anywhere here ...
   ...
   b = a
   b_attr = a_attr
   x = unbox_primitive(Fixnum, b, [b_attr])
   y = x

When you run SCCP on this, here is what happens to the unbox_primitive
instruction:
   x = unbox_primitive(Fixnum, b, [[BOXED_FIXNUM(v)]])

Thus, you know that you are unboxing a boxed value 'v'.  So, you can
simplify this
instruction to:
   x = v

So, when you run SCCP, the above sequence reduces to (I am excluding some
useless copies here)
   v = 5
   a = box_primitive(Fixnum, v, [])
   ...
   y = 5

After you run dead code elimination, assuming a is not used anywhere, the
code simplifies to
   v = 5
   y = 5

Summary:
--------
1. Every IR instruction is basically of the form
      v = OP(operand_array, attribute_array)

   Since this is a high-level IR, you are not bound to a 3-operand format.
   You can have an array of operands and an array of attributes.
   But, you can also have IR instructions of the form:
      v = OP(op1, attribute_array)
      v = OP(op1, op2, attribute_array)

2. You have regular ALU kind of IR instructions (add, sub, cmp, lshift,
etc.)

3. You have various flavor of call instructions:
     v = call(method, args)
     v = ocall(receiver, method, args)
     ... and anything else you can think of ...

4. You have a method-address instruction:
     m = meth_addr(receiver, method)

5. Instructions to extract function/method arguments and return function
values:
     x = meth_arg(arg_index)
     return y

6. You have various kinds of guard instructions:
     type_guard(obj, class, L)      <-- if type of obj is not class, jump to
L
     class_guard(class, L)          <-- if class has been monkey-patched,
jump to L
     method_guard(class, method, L) <-- if class.method has been
monkey-patched, jump to L

7. You have box/unbox instructions for primitives:
     n = box_primitive(class, v)
     v = unbox_primitive(class, n)

8. You have attribute instructions for passing around meta-information
(types, etc.)
     init_attr(a_attr, [])
     add_attr(a_attr, X)

   Note:
    - a_attr is just naming convention.  As far as representation is
concerned,
      it is just another SSA IR variable.
    - You need to identify the different types of meta-information you might
pass
      around.  The X above is a place-holder for that.

    Some kinds of X are:
      - BOXED_FIXNUM(n)
      - BOXED_ARRAY(a)
      - HAS_TYPE(class)
         ...

This is it for the 1st pass.  More later with an example of closures.

Reply via email to