Re: [jruby-dev] JRuby IR: First thoughts

Charles Oliver Nutter Fri, 01 May 2009 23:23:30 -0700

Subramanya Sastry wrote:

This note interweaves code optimization with discussion about IR because
the IR choice will be influenced by what you want to do with it.  I am
going to use examples below to guide this discussion.  A lot of it is
somewhat 'stream-of-thought' writing -- this is not a tight spec.


Thanks for jumping in Subbu :)

Here 'call' could be a regular C call for all you care.  So, what would
happen in the compiler is that you would first transform the code into
a higher-level IR (with virtual calls as above), and at some point in the
optimization, you will ratchet down the IR one level lower that might
expose other optimizations such as this one.  If you took care to maintain
the SSA-ness during transformations, optimizations such as eliminating

duplicate method look ups can be done by the SCCP (sparse conditionalconstant

propagation) algorithm which is linear and pretty fast for all practical
purposes.

Yes, this fits what I was thinking as well. I believe part of ourproblem in optimizing Ruby is that we only ever represent things at thehighest levels, ignoring the fact that there are very importantlow-level operations (like function retrieval) that should enter intoanalysis. So in essence, we've been going straight from our high-level,coarse-grained IR (the AST) to the target language (JVM bytecode) withonly primitive analysis in between. What we want is to decompose thehigh-level IR (perhaps after first making it a bit nicer than an AST)into the *actual* component operations, and performing multiple levelsof analysis before producing the eventual result. Yes, I see it.

This also enters into what we talked about at the coffee shop, wherebecause MRI is green threaded we have some implicit boundaries at whichwe must propagate cross-thread changes to global state (like methodtables). And that if we represent these operations at an IR level, wemay be able to coarsen things like method override checks until the nextimplicit thread-context boundary. Very nice.

Open classes & JIT => All optimizations have to be guarded
----------------------------------------------------------

...

But, if you are going to JIT Ruby code and enter JVM/HotSpot land, you have
to build in optimization guards throughout the code.  This means the IR has
to incorporate guard instructions. For example, here are few types ofguards:


Ahh...yes, this is exactly what I had in mind!

   type_guard(o, Fixnum, L)     <-- if type of o is not Fixnum, jump to L
class_guard(Fixnum, L) <-- if Fixnum has been monkey-patched,jump to Lmethod_guard(Fixnum, '+', L) <-- if Fixnum.+ has been monkey-patched,jump to L

What we have been doing currently, with our very limited optimizations,is just "crossing our fingers" and hoping things like Fixnum#+ don't getoverridden. But I think if we're able to make these sorts of guards justanother part of the IR, we'll suddenly have a lot better opportunitiesto do the right thing every time, without having to make assumptions.The guards will perform their checks, the results will propagate throughthe code, and when we hand off to Hotspot most of them will get folded away.

I'm also proceeding with this discussion from a perspective that ignoresthe JVM's limitations on bytecode size. Adding more logic to a method,such as to have fallback branches when guards fail, will often bite usbecause of Hotspot's code-size and graph-complexity limits. But I thinkthe opportunity here is to produce a flexible enough compiler chain thatwe can move that optimization bar up and down easily, taking advantageof newer JVMs as they come out or allowing more optimizations ifadditional Hotspot flags are provided.

The decomposition you're describing will definitely give us moreflexibility.

Clearly, there is a lot of overhead unboxing the objects -- but that isbecause
the objects are being passed in as arguments.  But, you can see that if this
expression had one or both literals, the overhead drops. If both wereliterals,the code reduces to boxing the result into a Fixnum. But, if not,HotSpot might beable to inline the 'value' calls at this site and get rid of theunboxing overhead.

Yes, especially in Java 7 where the available escape analysis is muchmore robust. If we are forced to allocate boxed values, we need to atleast make *damn* sure we can inline the code that produces or consumesthose boxed values, so we have a chance for them to compile and optimizeas part of the same compilation unit.

The experimental "dyncall inlining" flag in 1.3 may already be enablingthis, since I can see multiple levels of "fib" inline together into asingle unit of asm. If escape analysis is working like we expect, weshould see the boxing cost disappear for several levels of calls, onlyreturning when we have to actually CALL into a new body or RET back to aprevious one.

JRuby Inlining
--------------
Since HotSpot won't really be able to optimize the unboxing and addtype-based guardsas above, at some point, you might have to embrace method inlining. Forinlining,
you will use the high-level IR.

...

Clearly, this is still simplistic because you might be boxing the 8 intoa Fixnumprematurely, and it might never be needed. So, in order to be able tooptimizeboxing and unboxing, it might be useful to introduce IR instructionsthat encapsulateboxing and unboxing of primitive types. Since the target is the JVM,you willintroduce boxing/unboxing IR instructions for JVM primitives. Fixnumand Array
are obvious candidates.

So, box and unbox IR instructions could be:
   n = box_primitive(Fixnum, v)
   v = unbox_primitive(Fixnum, n)

If I remember right, I've seen similar operations in MacRuby's calls toLLVM, and I presume they get the same optimzation from it that we want.

With this, the inlined and optimized code above would become:
   ...
   method_guard(Fixnum, '+', L1)
   n = box_primitive(Fixnum, 8)
   jump L2
L1:
   args = [5]
   n = ocall 3, '+', args
L2:
   ...


Again, we have two perspectives:

If we assume no limitations on the code size and complexity we cangenerate (or if we assume we will always be able to work around thoselimitations) then emitting inlined logic with appropriate guards will befairly easy. As you show here, we'll perform our method and type guards,with the fast path immediately following successful tests and a branchto a slow path following failed tests.

If we are pessimistic and assume that code size limitations are going tobite us, then we need to be fairly selective about what we inline. Mathoperations are obvious, as are other core class operations andfrequently-used closure-receiving methods like Array#each. Again, havinga flexible compiler chain will allow us to slide this bar up and down.

Reality will surely lie somewhere in between, but I think we should keepboth extremes in mind while realizing there's more "future" than thereis "present" waiting for the fruits of our labors :)

The next problem is to propagate type information throughout a method toallowoptimizations to be as global as possible. In the optimizationanalyses, youhave the option of carrying around the state implicitly (carried alongin hashtableswhile running analyses), or explicitly (by introducing attributeinstructions inthe IR). The implicit approach is easy and preferable for cases whereinformationabout a variable is not use-site specific (constant propagation, forexample).But for cases where this is not true, the explicit approach ispreferable. Forexample, the type of an object along two paths of a branch would bedifferent.
So, a variable 'v' (in SSA form) might have type 'Fixnum' along one path and
'String' along another.

I prefer the explicit approach:
- they lend themselves to easier debugging because when you dump the IR,you can
  see what information is being generated.
- you convert use-site specific information to attributes that are no longer
use-site specific (because every branched path gets its own SSAattribute).- by converting site-use specific information to constants, you canincorporate
  transformations as part of regular constant propagation via SCCP.

I have used this approach to good benefit in my JVM compiler I wrote for my
dissertation -- I used this approach to eliminate null checks and arraybounds checks.

Yes, it seems to me the explicit approach would provide a much wider setof opportunities for us. A key example is turning this:


5.times {
  # stuff
}

... or this ...

a = 0
while a < 5
  # stuff
  a += 1
end

...into a primitive loop. That's only possible if we're able to tellthat at each of these sites 'a' is always provably a Fixnum (assumingFixnum operations have not been violated) and then producing acall-free, boxing-free loop. This is where we get into the real magicthat a good IR and tool chain can give us; with solid type propagation,these kinds of transformations just fall out.

So, with this in mind, you will now augment all IR instructions toinclude an
array of attributes.  So, the generic IR form will now be:

    v = OP(operands, attributes)
When you run SCCP on this, you also run SCCP on the attribute values.This letsyou use attributes as additional meta-information to optimize thatinstruction.
For example, consider this sequence of IR instructions:
v = 5
   a = box_primitive(Fixnum, v, [])
   a_attr = [BOXED_FIXNUM(v)]
   ...
   ... a is not used anywhere here ...
   ...
   b = a
   b_attr = a_attr
   x = unbox_primitive(Fixnum, b, [b_attr])
   y = x
When you run SCCP on this, here is what happens to the unbox_primitiveinstruction:
   x = unbox_primitive(Fixnum, b, [[BOXED_FIXNUM(v)]])
Thus, you know that you are unboxing a boxed value 'v'. So, you cansimplify this
instruction to:
   x = v
So, when you run SCCP, the above sequence reduces to (I am excludingsome useless copies here)
   v = 5
   a = box_primitive(Fixnum, v, [])
   ...
   y = 5
After you run dead code elimination, assuming a is not used anywhere,the code simplifies to
   v = 5
   y = 5


Here's an attempt to show this using my loop example from above:

Basic pseudo-IR

v = 0
a = box_primitive(Fixnum, v, [])
b = ocall a, '<', [box_primitive(Fixnum, 5, [])]
if_false [b, :end_label]
top_label: # stuff
a' = ocall a, '+' [box_primitive(Fixnum, 1, [])]
b' = ocall a', '<', [box_primitive(Fixnum, 5, [])]
if_true [b', :top_label]
end_label: .....

After propagating types

v = 0
a(Fixnum) = box_primitive(Fixnum, v, [])
b(Boolean) = ocall_typed Fixnum, a, '<', [box_primitive(Fixnum, 5, [])]
if_false [b, :end_label]
top_label: # stuff
a'(Fixnum) = ocall_typed Fixnum a, '+' [box_primitive(Fixnum, 1, [])]
b'(Boolean) = ocall_typed Fixnum a', '<', [box_primitive(Fixnum, 5, [])]
if_true [b', :top_label]
end_label: .....

And reducing/inlining based on those propagated types:

a = 0
if_ge a, 5, :end_label
top_label: #stuff
a' = add a, 1
if_lt a, 5, :top_label
end_label:

As long as Fixnum's *operations* remain as expected, this is 100% safeand legal.

Summary:
--------

This is a tremendous start! Thanks so much for jumping in with this. Ithink we're on the right track.

And this list is a perfect place for us to start prototyping a fewsimple localized examples...

1. Every IR instruction is basically of the formv = OP(operand_array, attribute_array)
   Since this is a high-level IR, you are not bound to a 3-operand format.
   You can have an array of operands and an array of attributes.
   But, you can also have IR instructions of the form:
      v = OP(op1, attribute_array)
      v = OP(op1, op2, attribute_array)
2. You have regular ALU kind of IR instructions (add, sub, cmp, lshift,etc.)
3. You have various flavor of call instructions:
     v = call(method, args)
     v = ocall(receiver, method, args)
     ... and anything else you can think of ...

4. You have a method-address instruction:
     m = meth_addr(receiver, method)
5. Instructions to extract function/method arguments and return functionvalues:
     x = meth_arg(arg_index)
     return y

6. You have various kinds of guard instructions:
type_guard(obj, class, L) <-- if type of obj is not class,jump to Lclass_guard(class, L) <-- if class has beenmonkey-patched, jump to Lmethod_guard(class, method, L) <-- if class.method has beenmonkey-patched, jump to L
7. You have box/unbox instructions for primitives:
     n = box_primitive(class, v)
     v = unbox_primitive(class, n)
8. You have attribute instructions for passing around meta-information(types, etc.)
     init_attr(a_attr, [])
     add_attr(a_attr, X)

   Note:
- a_attr is just naming convention. As far as representation isconcerned,
      it is just another SSA IR variable.
- You need to identify the different types of meta-information youmight pass
      around.  The X above is a place-holder for that.
Some kinds of X are:
      - BOXED_FIXNUM(n)
      - BOXED_ARRAY(a)
      - HAS_TYPE(class)
         ...

This is it for the 1st pass.  More later with an example of closures.



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email

Re: [jruby-dev] JRuby IR: First thoughts

Reply via email to