On Mon, May 14, 2012 at 4:30 AM, Jochen Theodorou <blackd...@gmx.org> wrote: > the special paths with guards in bytecode is actually a thing I was > hoping to get rid of with indy. The current state of the implementation > of indy in Groovy is, that it is slightly better than our call site > caching and worse than our prim opts. In total that means, unless I > combine indy with prim opts the indy versions is in general a tiny bit > slower, since even the small advantage over call site caching is not > always there. And call site caching in Groovy means we operate with at > runtime generated classes, with call sites, that are mostly not inlined > and other problems. Indy has the potential to be faster than that. Only > in reality I am missing that extra of performance. And that is a bit > sad. We had recently another 2.0 beta and a day later we had already > people complaining why the indy version is not faster. I mean, if I find > other places to optimize, then call site caching will profit from that > as well, not giving indy the real advantage here. > > I am worried about indy getting a bad image here.
Well, keep the faith :) In JRuby, indy has been truly excellent...significantly better than inline caching and many times better boxed numerics (we do not have primitive optimizations right now). It is not without its warts, of course. Complex method handle changes or large numbers of indy call sites can cause method bodies to fall off a performance cliff (like John talked about last week). A key goal for JRuby's uses of indy has been to keep the handles as simple as possible. I have also installed several tuning flags to turn off the use of indy for certain cases, for users that run into problems with it. I've tuned the length of polymorphic GWT chains, and made heavy use of SwitchPoint to reduce guard costs. Here's the red/black bench that's been going around...the compiler-level optimizations are the same in both cases, but the latter numbers are with invokedynamic. (higher is better...iterations/sec) No indy: #delete 12.0 (±0.0%) i/s - 60 in 5.014000s #add 26.3 (±0.0%) i/s - 132 in 5.019000s #search 47.6 (±6.3%) i/s - 240 in 5.065000s #inorder_walk 183.7 (±7.6%) i/s - 918 in 5.041000s #rev_inorder_walk 212.9 (±3.8%) i/s - 1080 in 5.080000s #minimum 92.4 (±1.1%) i/s - 468 in 5.065000s #maximum 95.6 (±2.1%) i/s - 486 in 5.086000s With indy: #delete 35.1 (±5.7%) i/s - 174 in 5.008000s #add 69.9 (±2.9%) i/s - 350 in 5.014000s #search 126.4 (±3.2%) i/s - 640 in 5.069999s #inorder_walk 711.1 (±6.7%) i/s - 3591 in 5.079000s #rev_inorder_walk 693.1 (±11.3%) i/s - 3422 in 5.027000s #minimum 305.3 (±2.0%) i/s - 1530 in 5.013000s #maximum 282.2 (±1.8%) i/s - 1428 in 5.062000s So 2-4x improvement on this benchmark *just* by using invokedynamic. This one is not numeric-heavy, so boxing costs don't come into play as much, but to me the results are incredibly promising. We've also had reports from users of large, heterogeneous applications of at least doubled perf running on indy, and in a couple cases improvements as much as 10x over non-indy perf. I'm very happy with the results so far :) - Charlie - Charlie _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev