It seems like the natural solution! :) Invokedynamic is nothing without the handles wiring it up...so they should always live happily together in the land of fairies and unicorns.
- Charlie On Mon, Oct 17, 2011 at 5:32 PM, Christian Thalinger <christian.thalin...@oracle.com> wrote: > You are really advocating this :-) > -- Chris > On Oct 17, 2011, at 7:11 PM, Charles Oliver Nutter wrote: > > More justification for this... > > Not inlining the handles would be like invokevirtual not emitting its type > check inline and doing that as a separate CALL. Handles are our way to tell > the JVM what native operations go with an invokedynamic. Not emitting those > operations unconditionally into the compiled code treats invokedynamic as a > second class citizen. Fair to say? > > - Charlie (mobile) > > On Oct 17, 2011 11:52 AM, "Charles Oliver Nutter" <head...@headius.com> > wrote: >> >> Yeah that does sound like the right approach the more I think about it. >> The invokedynamic operation and the handles that wire it up should never be >> separated. Pathology aside, I know the JRuby logic would remain pretty small >> in almost every case. And pathological cases could be detected with some >> kind of MH graph-walking calculation (with a suitably large budget since >> handles individually are maybe a couple native operations each). >> >> Let me know if you want me to test something out. I *really* hope this can >> get into u2. >> >> - Charlie (mobile) >> >> On Oct 17, 2011 9:28 AM, "Christian Thalinger" >> <christian.thalin...@oracle.com> wrote: >>> >>> On Oct 17, 2011, at 4:14 PM, Charles Oliver Nutter wrote: >>> >>> Yeah that's a pretty big problem :) Indy almost becomes a nonstarter if >>> these degraded cases are not made a lot better. The performance on this >>> example becomes terrible, and it's common for Ruby methods to be larger and >>> more complex than this, too. >>> >>> Yeah, I know. >>> >>> You say you don't know how to fix it, so perhaps we can brainstorm a bit? >>> If we can't inline the call we ate least want it to perform like an >>> uninlined method call. Could we still compile the MH chain and CALL it, so >>> at least we're not doing c2i? Or perhaps we should always treat the MH chain >>> as inlinable regardless of budgets, but not inline the eventual call when >>> budgets are exceeded? The latter option sounds more correct to me; the MH >>> chain should be considered a non-removable part of the invokedynamic >>> operation and always inlined. That would avoid the degradation without >>> blowing up code size. >>> >>> I didn't say I don't know how to fix it but rather what's the best >>> approach. Internally we (Tom, John and I) already talked about this and I >>> tried to do something what you suggest above: compile the MH chain and call >>> it. It turned out it's not that easy (as I thought) and needs some special >>> handling. >>> An interesting idea is to always inline the "adaption" code of the MH >>> chain but not the eventual call (or all calls in the chain). I would guess >>> that for normal usage the added code size is insignificant but there can >>> definitely be pathological cases. I try to get some data on that. >>> -- Chris >>> >>> - Charlie (mobile) >>> >>> On Oct 17, 2011 4:58 AM, "Christian Thalinger" >>> <christian.thalin...@oracle.com> wrote: >>>> >>>> On Oct 15, 2011, at 2:56 PM, Charles Oliver Nutter wrote: >>>> >>>> > I'm seeing something peculiar and wanted to run it by you folks. >>>> > >>>> > There are a few values that JRuby's compiler had previously been >>>> > loading from instance fields every time they're needed. Specifically, >>>> > fields like ThreadContext.runtime (the current JRuby runtime), >>>> > Ruby.falseObject, Ruby.trueObject, Ruby.nilObject (false, true, and >>>> > nil values). I figured I'd make a quick change today and have those >>>> > instead be constant method handles bound into a mutable call site. >>>> > >>>> > Unfortunately, performance seems to be worse. >>>> > >>>> > The logic works like this: >>>> > >>>> > * ThreadContext is loaded to stack >>>> > * invokedynamic, bootstrap just wires up an initialization method into >>>> > a MutableCallSite >>>> > * initialization method rebinds call site forever to a constant method >>>> > handle pointing at the value (runtime/true/false/nil objects) >>>> > >>>> > My expectation was that this would be at least no slower (and >>>> > potentially a tiny bit faster) but also less bytecode (in the case of >>>> > true/false/nil, it was previously doing >>>> > ThreadContext.runtime.getNil()/getTrue()/getFalse()). It seems like >>>> > it's actually slower than walking those references, though, and I'm >>>> > not sure why. >>>> > >>>> > Here's a couple of the scenarios in diff form showing bytecode before >>>> > and bytecode after: >>>> > >>>> > Loading "runtime" >>>> > >>>> > ALOAD 1 >>>> > - GETFIELD org/jruby/runtime/ThreadContext.runtime : >>>> > Lorg/jruby/Ruby; >>>> > + INVOKEDYNAMIC getRuntime >>>> > (Lorg/jruby/runtime/ThreadContext;)Lorg/jruby/Ruby; >>>> > >>>> > [org/jruby/runtime/invokedynamic/InvokeDynamicSupport.getObjectBootstrap(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/St >>>> > ring;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite; (6)] >>>> > >>>> > Loading "false" >>>> > >>>> > ALOAD 1 >>>> > - GETFIELD org/jruby/runtime/ThreadContext.runtime : >>>> > Lorg/jruby/Ruby; >>>> > - INVOKEVIRTUAL org/jruby/Ruby.getFalse ()Lorg/jruby/RubyBoolean; >>>> > + INVOKEDYNAMIC getFalse >>>> > (Lorg/jruby/runtime/ThreadContext;)Lorg/jruby/RubyBoolean; >>>> > >>>> > [org/jruby/runtime/invokedynamic/InvokeDynamicSupport.getObjectBootstrap(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite; >>>> > (6)] >>>> > >>>> > I think because these are now seen as invocations, I'm hitting some >>>> > inlining budget limit I didn't hit before (and which isn't being >>>> > properly discounted). The benchmark I'm seeing degrade is >>>> > bench/language/bench_flip.rb, and it's a pretty significant >>>> > degradation. Only the "heap" version shows the degradation, and it >>>> > definitely does have more bytecode...but the bytecode with my patch >>>> > differs only in the way these values are being accessed, as shown in >>>> > the diffs above. >>>> > >>>> > Before: >>>> > user system >>>> > total real >>>> > 1m x10 while (a)..(!a) (heap) 0.951000 0.000000 >>>> > 0.951000 ( 0.910000) >>>> > user system >>>> > total real >>>> > 1m x10 while (a)..(!a) (heap) 0.705000 0.000000 >>>> > 0.705000 ( 0.705000) >>>> > user system >>>> > total real >>>> > 1m x10 while (a)..(!a) (heap) 0.688000 0.000000 >>>> > 0.688000 ( 0.688000) >>>> > user system >>>> > total real >>>> > >>>> > After: >>>> > user system >>>> > total real >>>> > 1m x10 while (a)..(!a) (heap) 2.350000 0.000000 >>>> > 2.350000 ( 2.284000) >>>> > user system >>>> > total real >>>> > 1m x10 while (a)..(!a) (heap) 2.128000 0.000000 >>>> > 2.128000 ( 2.128000) >>>> > user system >>>> > total real >>>> > 1m x10 while (a)..(!a) (heap) 2.115000 0.000000 >>>> > 2.115000 ( 2.116000) >>>> > user system >>>> > total real >>>> > >>>> > You can see the degradation is pretty bad. >>>> > >>>> > I'm concerned because I had hoped that invokedynamic + mutable call >>>> > site + constant handle would always be faster than a field >>>> > access...since it avoids excessive field accesses and makes it >>>> > possible for Hotspot to fold those constants away. What's going on >>>> > here? >>>> >>>> I looked into this and the main issue here is an old friend: slow >>>> invokes of non-inlined MH call sites. The problem is that you trade a >>>> normal invoke (to a field load?) with a MH invoke. If the normal invoke >>>> doesn't get inlined we're good but if the MH invoke doesn't get inlined >>>> we're screwed (since we are still doing the C2I-I2C dance). >>>> >>>> I refactored the benchmark a little (moved stack and heap loops into its >>>> own methods and only do 5 while-loops instead of 11; that inlines all calls >>>> in that method) and the performance is like you had expected (a little >>>> faster): >>>> >>>> 32-bit: >>>> >>>> before: >>>> >>>> 1m x10 while (a)..(!a) (stack) 0.214000 0.000000 0.214000 >>>> ( 0.214000) >>>> 1m x10 while (a)..(!a) (heap) 0.249000 0.000000 0.249000 >>>> ( 0.250000) >>>> >>>> after: >>>> >>>> 1m x10 while (a)..(!a) (stack) 0.203000 0.000000 0.203000 >>>> ( 0.203000) >>>> 1m x10 while (a)..(!a) (heap) 0.234000 0.000000 0.234000 >>>> ( 0.234000) >>>> >>>> 64-bit: >>>> >>>> before: >>>> >>>> 1m x10 while (a)..(!a) (stack) 0.248000 0.000000 0.248000 >>>> ( 0.248000) >>>> 1m x10 while (a)..(!a) (heap) 0.257000 0.000000 0.257000 >>>> ( 0.257000) >>>> >>>> after: >>>> >>>> 1m x10 while (a)..(!a) (stack) 0.226000 0.000000 0.226000 >>>> ( 0.226000) >>>> 1m x10 while (a)..(!a) (heap) 0.244000 0.000000 0.244000 >>>> ( 0.244000) >>>> >>>> We have to fix that but I'm not sure yet what's the best approach. >>>> Sorry I don't have better news for now. >>>> >>>> -- Chris >>>> >>>> > >>>> > Patch for the change (apply to JRuby master) is here: >>>> > https://gist.github.com/955976b52b0c4e3f611e >>>> > >>>> > - Charlie >>>> > _______________________________________________ >>>> > mlvm-dev mailing list >>>> > mlvm-dev@openjdk.java.net >>>> > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>>> >>>> _______________________________________________ >>>> mlvm-dev mailing list >>>> mlvm-dev@openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev@openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>> >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev@openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>> > _______________________________________________ > mlvm-dev mailing list > mlvm-dev@openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev@openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev