Forwarding a conversation we've been having about IR2JVM work. The JRuby "IR" (Intermediate Representation) is our effort to build a completely new interpreter + compiler + runtime using a more optimizable structure. It currently can run RubySpecs green in 1.8 and 1.9 modes, which is a big milestone. I'm working on the new JIT that will go from IR to JVM bytecode, since the new JIT will be much simpler and we'll rely on IR manipulations to perform optimizations like constant propagation, dead code elimination, and so on. Subbu has been leading the effort with Tom doing a lot of work on interpretation and me slowly getting into compilation. Subbu has a blog post about IR coming soon.
Anyway, you can read this conversation and we'll try to have future conversations about IR on this list too. - Charlie Forwarded conversation Subject: A bit more IR2JVM work on the flight home ------------------------ From: *Charles Oliver Nutter* <head...@headius.com> Date: Sun, Apr 22, 2012 at 8:58 PM To: Subramanya Sastry <sss.li...@gmail.com>, Thomas Enebo < tom.en...@gmail.com> https://gist.github.com/2466877 Good old fib again. -X+CIR is faster than -X+C but it's not doing all appropriate call site guarding. The main change I made was to get them binding "properly" with invokedynamic rather than binding to a slow lookup. So, +CIR is basically around as fast as +C at the moment (or a little faster because it's not 100% correct yet). - Charlie ---------- From: *Charles Oliver Nutter* <head...@headius.com> Date: Sun, Apr 22, 2012 at 9:12 PM To: Subramanya Sastry <sss.li...@gmail.com>, Thomas Enebo < tom.en...@gmail.com> With appropriate guards added, it's just about identical perf to current compiler: system ~/projects/jruby $ time jruby -X+CIR -e "def fib(a); if a < 2; a; else; fib(a - 1) + fib(a - 2); end; end; p fib(35); p fib(35); p fib(35); p fib(35); p fib(35)" 9227465 9227465 9227465 9227465 9227465 real 0m3.120s user 0m4.087s sys 0m0.113s - Charlie ---------- From: *Charles Oliver Nutter* <head...@headius.com> Date: Sun, Apr 22, 2012 at 9:58 PM To: Subramanya Sastry <sss.li...@gmail.com>, Thomas Enebo < tom.en...@gmail.com> I started adding support for constants, but discovered that the IR has broken constant lookup into its component pieces. There's nothing wrong with this, per se, but for the JVM compiler I hid all that logic behind invokedynamic-based caching, and having it broken out in the IR means there's many instructions for what would be a single invokedynamic operation for me in the current compiler. I started implementing the pieces, and almost have it working, but I wanted to bring this up so I can learn about the gains from having the parts of constant lookup split out in the IR. What are they? - Charlie On Sun, Apr 22, 2012 at 11:12 PM, Charles Oliver Nutter ---------- From: *Subramanya Sastry* <sss.li...@gmail.com> Date: Mon, Apr 23, 2012 at 12:42 AM To: Charles Oliver Nutter <head...@headius.com> Cc: Thomas Enebo <tom.en...@gmail.com> First of all, good to hear about the near identical performance when IR is compiled down. So, at least for that microbenchmark, all info is getting through properly which is good to know. As for constant lookup, the primary reason I split it up was for the JIT, because it actually hurts the interpreter which wants fewer instructions. Well, to be honest, there were two other minor motivations which I'll get out of the way first since they are not as important. (a) To more accurately reflect constant lookup semantics for every constant reference -- having explicit lexical search, inheritance search, and const missing calls exposes "primitive" operations via the IR rather than hiding them in the runtime (b) Some const lookups (colon2, colon3) only do inheritance lookup, and it seems like less work to do (though mostly insignificant) and also because it more accurately reflects what is going on as in (a). This could be a good debugging benefits for a Ruby programmer, for ex. to know why their const lookup failed. Now, the other reason why I split it is because sometimes you could optimize individual pieces of constant lookup -- for example, determine that lexical lookup will never/always succeed and ignore one or the other pieces and hardwire that result. Ex: class C; def self.foo; 5; end; end; p C.foo .. this lookup of C will always succeed lexically. You can imagine the other scenario. A lot of constant lookups probably fall in these 2 categories. So, this is an opportunity for optimization (which hasn't been implemented yet) once we verify that we have the semantics right. On the other hand, it is true that you can hide the whole thing behind a indy call site -- but, it is possible to do the same with the individual pieces, but probably more runtime code. With Dalvik without indy support, the story is a little different, and I think there are benefits from individual optimizable lookup pieces. Anyway, at this time, I am not sure if this is helping more or getting in the way more -- something we can investigate/experiment with. Subbu. ---------- From: *Charles Oliver Nutter* <head...@headius.com> Date: Mon, Apr 23, 2012 at 3:29 AM To: Subramanya Sastry <sss.li...@gmail.com> On Mon, Apr 23, 2012 at 12:42 AM, Subramanya Sastry <sss.li...@gmail.com> wrote: > > First of all, good to hear about the near identical performance when IR is > compiled down. So, at least for that microbenchmark, all info is getting > through properly which is good to know. > > As for constant lookup, the primary reason I split it up was for the JIT, > because it actually hurts the interpreter which wants fewer instructions. > Well, to be honest, there were two other minor motivations which I'll get > out of the way first since they are not as important. (a) To more > accurately reflect constant lookup semantics for every constant reference -- > having explicit lexical search, inheritance search, and const missing calls > exposes "primitive" operations via the IR rather than hiding them in the > runtime (b) Some const lookups (colon2, colon3) only do inheritance lookup, > and it seems like less work to do (though mostly insignificant) and also > because it more accurately reflects what is going on as in (a). This > could be a good debugging benefits for a Ruby programmer, for ex. to know > why their const lookup failed. Having the steps of colon2 and colon3 split out into IR operations will indeed make them easier for me to optimize, since retrieving the LHS of the :: and then retrieving the RHS will both be seen as separate operations. The current compiler is not quite as clean when it comes to dealing with those two, and can only optimize "simple" const lookup with indy. But, while I appreciate that, and the fact that more logic is split out in the instructions for the JIT...it does lead to a lot more bytecode being generated. Example: https://gist.github.com/2468662 Of course I don't *know that it's too much code, but some methods have a lot of constants in them...it's hard to say. Anyway, I got hung up implementing constant lookup because of all the instructions; I set aside time this afternoon to implement a couple instrs, and it kept becoming more and more :) I'll see if I can't implement the full const lookup logic tomorrow. > Now, the other reason why I split it is because sometimes you could optimize > individual pieces of constant lookup -- for example, determine that lexical > lookup will never/always succeed and ignore one or the other pieces and > hardwire that result. Ex: class C; def self.foo; 5; end; end; p C.foo .. > this lookup of C will always succeed lexically. You can imagine the other > scenario. A lot of constant lookups probably fall in these 2 categories. > So, this is an opportunity for optimization (which hasn't been implemented > yet) once we verify that we have the semantics right. Yes, that's probably true, but the problem I always ran into trying to optimize one side or the other was mostly because lexical lookup comes first. Because of that, any of the lexically-enclosing modules could gain that constant at any time. This is ultimately why we only have a global invalidation for constants, and why I've never had much luck splitting the logic into pieces...they all have to check the same single token to cache anything, so I just stick them all behind the cache and validate only once that way. > On the other hand, it is true that you can hide the whole thing behind a > indy call site -- but, it is possible to do the same with the individual > pieces, but probably more runtime code. With Dalvik without indy support, > the story is a little different, and I think there are benefits from > individual optimizable lookup pieces. > > Anyway, at this time, I am not sure if this is helping more or getting in > the way more -- something we can investigate/experiment with. Sure, it's not insurmountable. The use of this special UNDEFINED value probably needs to change to RubyObject.UNDEF or similar, so I have something I can more easily lookup in the compiler. And the constant lookup interpreter impls use some runtime checks against operations I'm going to have to think about for the JVM backend. But it's all doable, and most of it probably boils down to a single call in each case and some way of recognizing failed lookups. - Charlie ---------- From: *Subramanya Sastry* <sss.li...@gmail.com> Date: Mon, Apr 23, 2012 at 4:23 AM To: Charles Oliver Nutter <head...@headius.com> Having the steps of colon2 and colon3 split out into IR operations > will indeed make them easier for me to optimize, since retrieving the > LHS of the :: and then retrieving the RHS will both be seen as > separate operations. The current compiler is not quite as clean when > it comes to dealing with those two, and can only optimize "simple" > const lookup with indy. > > But, while I appreciate that, and the fact that more logic is split > out in the instructions for the JIT...it does lead to a lot more > bytecode being generated. Example: https://gist.github.com/2468662 > > Of course I don't *know that it's too much code, but some methods have > a lot of constants in them...it's hard to say. > Let's see how this actually behaves when we get things running and what this does to performance. Not too hard to roll back, if necessary. Anyway, I got hung up implementing constant lookup because of all the > instructions; I set aside time this afternoon to implement a couple > instrs, and it kept becoming more and more :) I'll see if I can't > implement the full const lookup logic tomorrow. > > > Now, the other reason why I split it is because sometimes you could > optimize > > individual pieces of constant lookup -- for example, determine that > lexical > > lookup will never/always succeed and ignore one or the other pieces and > > hardwire that result. Ex: class C; def self.foo; 5; end; end; p C.foo > .. > > this lookup of C will always succeed lexically. You can imagine the > other > > scenario. A lot of constant lookups probably fall in these 2 categories. > > So, this is an opportunity for optimization (which hasn't been > implemented > > yet) once we verify that we have the semantics right. > > Yes, that's probably true, but the problem I always ran into trying to > optimize one side or the other was mostly because lexical lookup comes > first. Because of that, any of the lexically-enclosing modules could > gain that constant at any time. Is this always true, i.e., does this dynamic addition of constants affect all lexical const lookups? I would like to understand this better tomorrow. Are you ever on IM? I have yahoo/gchat (subbu_ss/sss.lists). If you want to discuss on IRC, we can do that as well. Subbu. ---------- From: *Subramanya Sastry* <sss.li...@gmail.com> Date: Thu, Apr 26, 2012 at 1:16 AM To: Charles Oliver Nutter <head...@headius.com> Cc: Thomas Enebo <tom.en...@gmail.com> I added back a SearchConstInstr which does all 3 things (lexical lookup, inheritance lookup, const_missing invocation). I didn't touch colon2 and colon3 however. They continue to use split instructions for InheritanceSearch and ConstMissing. You also wanted UndefinedValue to move to RubyObject. Yet to be done -- it is still used in the colon2 and colon3 cases. -S.