A thread emerges! I'm going to be taking some time this holiday to explore the performance of the new LF indy impl in various situations. This will be the thread where I gather observations.
A couple preliminaries... My perf exploration so far seems to show LF performing nearly equivalent to the old impl for the smallest benchmarks, with performance rapidly degrading as the size of the code involved grows. Recursive fib and tak have nearly identical perf on LF and the old impl. Red/black performs about the same on LF as with indy disabled, well behind the old indy performance. At some point, LF falls completely off the cliff and can't even compete with non-indy logic, as in a benchmark I ran today of Ruby constant access (heavily SwitchPoint-dependent). Discussions with Christian seem to indicate that the fall-off is because non-inlined LF indy call sites perform very poorly compared to the old impl. I'll be trying to explore this and correlate the perf cliff with failure to inline. Christian has told me that (upcoming?) work on incremental inlining will help reduce the performance impact of the fall-off, but I'm not sure of the status of this work. Some early ASM output from a trivial benchmark: loop 500M times calling #foo, which immediately calls #bar, which just returns the self object (ALOAD 2; ARETURN in essence). I've been comparing the new ASM to the old, both presented in a gist here: https://gist.github.com/4365103 As you can see, the code resulting from both impls boils down to almost nothing, but there's one difference... New code not present in old: 0x0000000111ab27ef: je 0x0000000111ab2835 ;*ifnull ; - java.lang.Class::cast@1 (line 3007) ; - java.lang.invoke.LambdaForm$MH/763053631::guard@12 ; - java.lang.invoke.LambdaForm$MH/518216626::linkToCallSite@14 ; - ruby.__dash_e__::method__0$RUBY$foo@3 (line 1) A side effect of inlining through LFs, I presume? Checking to ensure non-null call site? If so, shouldn't this have folded away, since the call site is constant? In any case, it's hardly damning to have an extra branch. This output is, at least, proof that LF *can* inline and optimize as well as the old impl...so we can put that aside for now. The questions to explore then are: * Do cases expected to inline actually do so under LF impl? * When inlining, does code optimize as it should (across the various shapes of call sites in JRuby, at least)? * When code does not inline, how does it impact performance? My expectation is that cases which should inline do so under LF, but that the non-inlined performance is significantly worse than under the old impl. The critical bit will be ensuring that even when LF call sites do not inline, they at least still compile to avoid interpretation and LF-to-LF overhead. At a minimum, it seems like we should be able to expect all LF between a call site and its DMH target will get compiled into a single unit, if not inlined into the caller. I still contend that call site + LFs should be heavily prioritized for inlining either into the caller or along with the called method, since they really *are* the shape of the call site. If there has to be a callq somewhere in that chain, there should ideally be only one. So...here we go. - Charlie _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev