Forwarding a conversation we've been having about IR2JVM work.

The JRuby "IR" (Intermediate Representation) is our effort to build a
completely new interpreter + compiler + runtime using a more optimizable
structure. It currently can run RubySpecs green in 1.8 and 1.9 modes, which
is a big milestone. I'm working on the new JIT that will go from IR to JVM
bytecode, since the new JIT will be much simpler and we'll rely on IR
manipulations to perform optimizations like constant propagation, dead code
elimination, and so on. Subbu has been leading the effort with Tom doing a
lot of work on interpretation and me slowly getting into compilation. Subbu
has a blog post about IR coming soon.

Anyway, you can read this conversation and we'll try to have future
conversations about IR on this list too.

- Charlie

Forwarded conversation
Subject: A bit more IR2JVM work on the flight home
------------------------

From: *Charles Oliver Nutter* <head...@headius.com>
Date: Sun, Apr 22, 2012 at 8:58 PM
To: Subramanya Sastry <sss.li...@gmail.com>, Thomas Enebo <
tom.en...@gmail.com>


https://gist.github.com/2466877

Good old fib again. -X+CIR is faster than -X+C but it's not doing all
appropriate call site guarding. The main change I made was to get them
binding "properly" with invokedynamic rather than binding to a slow
lookup.

So, +CIR is basically around as fast as +C at the moment (or a little
faster because it's not 100% correct yet).

- Charlie

----------
From: *Charles Oliver Nutter* <head...@headius.com>
Date: Sun, Apr 22, 2012 at 9:12 PM
To: Subramanya Sastry <sss.li...@gmail.com>, Thomas Enebo <
tom.en...@gmail.com>


With appropriate guards added, it's just about identical perf to
current compiler:

system ~/projects/jruby $ time jruby -X+CIR -e "def fib(a); if a < 2;
a; else; fib(a - 1) + fib(a - 2); end; end; p fib(35); p fib(35); p
fib(35); p fib(35); p fib(35)"
9227465
9227465
9227465
9227465
9227465

real    0m3.120s
user    0m4.087s
sys     0m0.113s

- Charlie

----------
From: *Charles Oliver Nutter* <head...@headius.com>
Date: Sun, Apr 22, 2012 at 9:58 PM
To: Subramanya Sastry <sss.li...@gmail.com>, Thomas Enebo <
tom.en...@gmail.com>


I started adding support for constants, but discovered that the IR has
broken constant lookup into its component pieces. There's nothing
wrong with this, per se, but for the JVM compiler I hid all that logic
behind invokedynamic-based caching, and having it broken out in the IR
means there's many instructions for what would be a single
invokedynamic operation for me in the current compiler.

I started implementing the pieces, and almost have it working, but I
wanted to bring this up so I can learn about the gains from having the
parts of constant lookup split out in the IR. What are they?

- Charlie

On Sun, Apr 22, 2012 at 11:12 PM, Charles Oliver Nutter

----------
From: *Subramanya Sastry* <sss.li...@gmail.com>
Date: Mon, Apr 23, 2012 at 12:42 AM
To: Charles Oliver Nutter <head...@headius.com>
Cc: Thomas Enebo <tom.en...@gmail.com>



First of all, good to hear about the near identical performance when IR is
compiled down.  So, at least for that microbenchmark, all info is getting
through properly which is good to know.

As for constant lookup, the primary reason I split it up was for the JIT,
because it actually hurts the interpreter which wants fewer instructions.
Well, to be honest, there were two other minor motivations which I'll get
out of the way first since they are not as important.  (a) To more
accurately reflect constant lookup semantics for every constant reference
-- having explicit lexical search, inheritance search, and const missing
calls exposes "primitive" operations via the IR rather than hiding them in
the runtime (b) Some const lookups (colon2, colon3) only do inheritance
lookup, and it seems like less work to do (though mostly insignificant) and
also because it more accurately reflects what is going on as in (a).
This could be a good debugging benefits for a Ruby programmer, for ex. to
know why their const lookup failed.

Now, the other reason why I split it is because sometimes you could
optimize individual pieces of constant lookup -- for example, determine
that lexical lookup will never/always succeed and ignore one or the other
pieces and hardwire that result.   Ex: class C; def self.foo; 5; end; end;
p C.foo .. this lookup of C will always succeed lexically.  You can imagine
the other scenario.  A lot of constant lookups probably fall in these 2
categories.  So, this is an opportunity for optimization (which hasn't been
implemented yet) once we verify that we have the semantics right.

On the other hand, it is true that you can hide the whole thing behind a
indy call site -- but, it is possible to do the same with the individual
pieces, but probably more runtime code.  With Dalvik without indy support,
the story is a little different, and I think there are benefits from
individual optimizable lookup pieces.

Anyway, at this time, I am not sure if this is helping more or getting in
the way more -- something we can investigate/experiment with.

Subbu.

----------
From: *Charles Oliver Nutter* <head...@headius.com>
Date: Mon, Apr 23, 2012 at 3:29 AM
To: Subramanya Sastry <sss.li...@gmail.com>


On Mon, Apr 23, 2012 at 12:42 AM, Subramanya Sastry <sss.li...@gmail.com>
wrote:
>
> First of all, good to hear about the near identical performance when IR is
> compiled down.  So, at least for that microbenchmark, all info is getting
> through properly which is good to know.
>
> As for constant lookup, the primary reason I split it up was for the JIT,
> because it actually hurts the interpreter which wants fewer instructions.
> Well, to be honest, there were two other minor motivations which I'll get
> out of the way first since they are not as important.  (a) To more
> accurately reflect constant lookup semantics for every constant reference
--
> having explicit lexical search, inheritance search, and const missing
calls
> exposes "primitive" operations via the IR rather than hiding them in the
> runtime (b) Some const lookups (colon2, colon3) only do inheritance
lookup,
> and it seems like less work to do (though mostly insignificant) and also
> because it more accurately reflects what is going on as in (a).    This
> could be a good debugging benefits for a Ruby programmer, for ex. to know
> why their const lookup failed.

Having the steps of colon2 and colon3 split out into IR operations
will indeed make them easier for me to optimize, since retrieving the
LHS of the :: and then retrieving the RHS will both be seen as
separate operations. The current compiler is not quite as clean when
it comes to dealing with those two, and can only optimize "simple"
const lookup with indy.

But, while I appreciate that, and the fact that more logic is split
out in the instructions for the JIT...it does lead to a lot more
bytecode being generated. Example: https://gist.github.com/2468662

Of course I don't *know that it's too much code, but some methods have
a lot of constants in them...it's hard to say.

Anyway, I got hung up implementing constant lookup because of all the
instructions; I set aside time this afternoon to implement a couple
instrs, and it kept becoming more and more :) I'll see if I can't
implement the full const lookup logic tomorrow.

> Now, the other reason why I split it is because sometimes you could
optimize
> individual pieces of constant lookup -- for example, determine that
lexical
> lookup will never/always succeed and ignore one or the other pieces and
> hardwire that result.   Ex: class C; def self.foo; 5; end; end; p C.foo ..
> this lookup of C will always succeed lexically.  You can imagine the other
> scenario.  A lot of constant lookups probably fall in these 2 categories.
> So, this is an opportunity for optimization (which hasn't been implemented
> yet) once we verify that we have the semantics right.

Yes, that's probably true, but the problem I always ran into trying to
optimize one side or the other was mostly because lexical lookup comes
first. Because of that, any of the lexically-enclosing modules could
gain that constant at any time. This is ultimately why we only have a
global invalidation for constants, and why I've never had much luck
splitting the logic into pieces...they all have to check the same
single token to cache anything, so I just stick them all behind the
cache and validate only once that way.

> On the other hand, it is true that you can hide the whole thing behind a
> indy call site -- but, it is possible to do the same with the individual
> pieces, but probably more runtime code.  With Dalvik without indy support,
> the story is a little different, and I think there are benefits from
> individual optimizable lookup pieces.
>
> Anyway, at this time, I am not sure if this is helping more or getting in
> the way more -- something we can investigate/experiment with.

Sure, it's not insurmountable. The use of this special UNDEFINED value
probably needs to change to RubyObject.UNDEF or similar, so I have
something I can more easily lookup in the compiler. And the constant
lookup interpreter impls use some runtime checks against operations
I'm going to have to think about for the JVM backend. But it's all
doable, and most of it probably boils down to a single call in each
case and some way of recognizing failed lookups.

- Charlie

----------
From: *Subramanya Sastry* <sss.li...@gmail.com>
Date: Mon, Apr 23, 2012 at 4:23 AM
To: Charles Oliver Nutter <head...@headius.com>




Having the steps of colon2 and colon3 split out into IR operations
> will indeed make them easier for me to optimize, since retrieving the
> LHS of the :: and then retrieving the RHS will both be seen as
> separate operations. The current compiler is not quite as clean when
> it comes to dealing with those two, and can only optimize "simple"
> const lookup with indy.
>
> But, while I appreciate that, and the fact that more logic is split
> out in the instructions for the JIT...it does lead to a lot more
> bytecode being generated. Example: https://gist.github.com/2468662
>
> Of course I don't *know that it's too much code, but some methods have
> a lot of constants in them...it's hard to say.
>


Let's see how this actually behaves when we get things running and what
this does to performance.  Not too hard to roll back, if necessary.


Anyway, I got hung up implementing constant lookup because of all the
> instructions; I set aside time this afternoon to implement a couple
> instrs, and it kept becoming more and more :) I'll see if I can't
> implement the full const lookup logic tomorrow.
>
> > Now, the other reason why I split it is because sometimes you could
> optimize
> > individual pieces of constant lookup -- for example, determine that
> lexical
> > lookup will never/always succeed and ignore one or the other pieces and
> > hardwire that result.   Ex: class C; def self.foo; 5; end; end; p C.foo
> ..
> > this lookup of C will always succeed lexically.  You can imagine the
> other
> > scenario.  A lot of constant lookups probably fall in these 2 categories.
> > So, this is an opportunity for optimization (which hasn't been
> implemented
> > yet) once we verify that we have the semantics right.
>
> Yes, that's probably true, but the problem I always ran into trying to
> optimize one side or the other was mostly because lexical lookup comes
> first. Because of that, any of the lexically-enclosing modules could
> gain that constant at any time.


Is this always true, i.e., does this dynamic addition of constants affect
all lexical const lookups?  I would like to understand this better tomorrow.
Are you ever on IM?  I have yahoo/gchat (subbu_ss/sss.lists).  If you want
to discuss on IRC, we can do that as well.

Subbu.

----------
From: *Subramanya Sastry* <sss.li...@gmail.com>
Date: Thu, Apr 26, 2012 at 1:16 AM
To: Charles Oliver Nutter <head...@headius.com>
Cc: Thomas Enebo <tom.en...@gmail.com>



I added back a SearchConstInstr which does all 3 things (lexical lookup,
inheritance lookup, const_missing invocation).   I didn't touch colon2 and
colon3 however.  They continue to use split instructions for
InheritanceSearch and ConstMissing.   You also wanted UndefinedValue to
move to RubyObject.  Yet to be done -- it is still used in the colon2 and
colon3 cases.  -S.

Reply via email to