Assume a closure is implemented as follows:

    class Closure {
        public Scope  s;    // Enclosing scope
        public Method meth;

        Closure(Scope s, Method m)   { ... }
        public Object yield(Object[] args) { m.call(s, args); }
    }

At a call site that takes a block, you have the following code:
   s = new Scope(...)
   c = new Closure(s, m)
m is the block in method form.  You then pass c as the block argument of the
method.

Let us now work out examples to see how this translates into an IR and how
we can optimize closures.

Example 2: Sum of squares from 1 to n using blocks (no scope variables used)
----------------------------------------------------------------------------

  sum = (1..n).inject(0) { |s,i| s + i*i }

IR: Straightforward transformation
----------------------------------
       ## Build the range
  a = boxed_range(Fixnum, 1, n)

       ## Construct the closure
  c = begin_closure
    s = closure_arg[0]
    i = closure_arg[1]
    args[0] = i
    v1 = ocall(i, '*', args)
    args[0] = v1
    s = ocall(s, '+', args)
     closure_return s
  end_closure

       ## Call inject
  args[0] = 0
  args[1] = c
  sum = ocall(a, 'inject', args)

If you can determine that the closure doesn't use any variables from its
surrounding scope,
you can use a static closure object for the closure/block rather than having
to allocate
a scope and a closure object for every instance of its parent method's
execution.  This
transformation is relatively straightforward.  If there are no def-use
chains that cross
the closure boundary, you go home scot-free.  In that case, you allocate a
global closure
object with a null scope and the closure method.  This immediately
eliminates the heap
overhead.

Since you know that a is a Range object, you can do more interesting things
as follows.
Within the compiler, you have stubs for JRuby implementations of various
classes (Array,
String, Range, etc.) which have the ability to take the IR for a closure and
either
return a new inlined IR, or a method call IR (effectively refusing to inline
the closure
for example because the closure was too big, or whatever reason).  This has
the benefit
of eliminating a whole range of additional intermediate analyses (i.e.
inline the inject
first, and then inline the closure in place of the yield).

So, if you did that, you might get the following code now:

       ## Build the range
  a = boxed_range(Fixnum, 1, n)
  c = begin_closure
  ...
  end_closure

  i    = 1                ## Regular integer
  last = n
L1:
  bgt(i, last, L2)
  val = i
  s   = sum
  i_2 = val
  args[0] = i_2
  v1 = ocall(i_2, '*', args)
  args[0] = v1
  s = ocall(s, '+', args)
  sum = s

  i = i + 1                ## Integer arithmetic!
L2:

Now, you can get rid of the closure creation since 'c' is not used anywhere
within
the method.  Thus, the 3-step process can effectively inline closures for
common classes
and get rid of closure overhead in a lot of cases.

1. Represent a closure as c = begin_closure ... end_closure
2. Where you accurately know the class of the receiver method and where
there is
   a JRuby implementation of the class, you ask the class to give you the IR
for
    the method with the closure inlined
3. Live variable analysis and dead code elimination gets rid of the closure
creation.

Example 3: Sum of squares from 1 to n using blocks (use of scope variables)
---------------------------------------------------------------------------

Now, consider this non-functional way of implementing the sum-of-squares:

  sum = 0
  (1..n).each { |i| sum += i*i }
  ... use sum here ...

Here sum is a variable that has "escaped" the block.  So, you are forced to
allocate
a scope in this case (unless you can inline the call and the block!)

IR: Straightforward transformation
----------------------------------
  sum = 0

       ## Build the range
  a = boxed_range(Fixnum, 1, n)

       ## Construct the closure
  c = begin_closure
    i = closure_arg[0]
    args[0] = i
    v1 = ocall(i, '*', args)
    args[0] = v1
    sum = ocall(sum, '+', args)
     closure_return sum
  end_closure

       ## Call each
  args[0] = c
  ocall(a, 'each', args)

Using the technique explained in the previous example, you would be able to
inline call, eliminate the closure and no one the wiser ...

But, assuming you couldn't inline and eliminate the closure, every reference
to
sum inside and outside the closure would have to get replaced by
load_scope_var
and store_scope_var IR instructions.  This ensures that the use of 'sum'
outside
the block gets the correct value.

Okay, that is it for this email.  Late in the night :-)   Hopefully, there
aren't any serious
gaffes here.

Reply via email to