[jvm-l] Re: avoiding boxing

Jochen Theodorou Thu, 01 May 2008 02:58:32 -0700

Charles Oliver Nutter schrieb:
> Jochen Theodorou wrote:
>> ok, let me try to explain what I think of... The current system in 
>> Groovy works like this: you have a narrow API, with some core that 
>> actually selects and executes the method call. Ng is more or less the 
>> same, but with a wide API. What I plan for the future is not no longer 
>> let the core execute the methods, instead they return handles to the 
>> call site and the call site will call the method for us.
> 
> Yes, I recall the discussions when this was implemented on trunk.


it is a bit different on trunk.. you can see this as an experimental 
version of parts of this. The problem is that the current protocol would 
normally not allow this. So we pollute the protocol with something that 
should not be there. As of design and specification that is a big 
problem, so 2.0 is thought to give a clean solution. Also some parts can 
no longer be done by the old model. For example if a property is 
requested and it should be tested if private access is allowed, then we 
currently have to allow that in general, because the protocol looses the 
information needed to test if the access is ok, or not.

> And 
> from my own tests, it definitely had improved performance, but I haven't 
> done a wide range of testing (as I'm sure you have, e.g. grails and 
> otherwise). One concern that occurs to me is how this affects the 
> locality of the call site.

In 1.5.x the call site does a method call to ScriptByteCodeAdapter, 
which can be inlined. From there on it gets to the MetaClass through 
some quite complicated code, and I doubt there is much inlining done. In 
the end a reflective method is selected and called, and I am sure this 
can not be inlined. Well... to be frank, I think inlinign cold be a 
problem for ScriptByteCodeAdapter, because the method there is 
megamorphic already. With call site caches we might not be able to 
inline the method selection parts, or the parts validating a method, but 
   at last we get a mostly monomorphic call site

> Where in JRuby, the call site is never more 
> than a field access away, in Groovy it's retrieved from the same long 
> pipeline.

no... the call site itself is a field access too.

> So that pipeline has to be doing some amount of "getting in 
> the way" even if the call site encapsulates and eliminates a certain 
> portion of it. Or am I misunderstanding? This doesn't seem as much like 
> a call site optimization as simply currying a portion of the lookup 
> process into an object you then cache at the metaclass level for future 
> calls (and removing if there are changes).

Of course the purpose of the call site cache is to not to go through the 
complete pipeline again. we select the method one time and unless the 
MetaClass has been changed, there is no reason to go through it again. 
The default MetaClass does not allow changes, so there is no problem. 
EMC does allow changes, but we then ask EMC for the changes. Methods 
from other MetaClasses are then simply not cached. Well there is also 
ClosureMetaClass of course ;)

We had a cache at the MetaClass for a long time. It did store the 
argument types, the method name, and some other things. But creating a 
key for this cache and asking the cache had been quite slow. In 
ClosureMetaClass I removed the cache and instead sued the special 
properties of the Closure to write a specialized and simplified method 
selection. And this is up to 40% faster, than the version with the 
cache. And it is not that the cache doesn't bring any benefit. It makes 
method calls faster, but in case of a closure I can do so simple method 
selection algorithms that the cache is much slower, even without a miss.

> Perhaps a stack trace of a typical call through one of your "call sites" 
> would help illustrate the effect better?

ok, this code:
def foo() {throw new Exception("call site!")}
foo()

results in a trace like this:

> Caught: java.lang.Exception: call site!
> java.lang.Exception: call site!
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at 
> org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:70)
>       at 
> org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.invoke(ConstructorSite.java:84)
>       at 
> org.codehaus.groovy.runtime.callsite.CallSite.callConstructor(CallSite.java:142)
>       at test.foo(test.groovy:48)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:182)
>       at 
> org.codehaus.groovy.runtime.callsite.CallSite.callCurrent(CallSite.java:130)
>       at test.run(test.groovy:49)

test.groovy:49 is where foo() is called, test.groovy:48 is where the 
method foo() begins. As you can see, between these two there is only 
reflection code. After that is code for creating the exception.. which 
could be a bit improved.

>> This design is very much oriented at invokedynamic, but we came up with 
>> this before invokednymic. Of course MethodHandles, such as described by 
>> John Rose will come in very handy here. Most of what can be done today 
>> with monkey patching and categories fits well in his new way. I plan 
>> also to restrict a MetaClass to be no longer replaceable, but mutating 
>> it is allowed. The downside of this is, that if you want for example 
>> write code that reacts to each method call, that you have to put that in 
>> a MetaMethod. But much of what is done today will work without change I 
>> think.
> 
> That is a *big* change for the language, I think, but in my opinion a 
> very good one (and of course we've talked about this in the past). I 
> believe that Groovy's ability to not only replace methods (EMC) and 
> install categories, but to also wholesale replace metaclasses with 
> custom implementations, often implemented in Groovy themselves, is a 
> major barrier to optimization.

well true... live would be a lot more easy without these. But I think 
the replacement with a custom metaclass is the only thing we might throw 
out.

> I don't see the value in categories 
> myself, so I won't go there. 

I plan also a new kind of category, one that is lexically scoped. We may 
then remove the old categories... not sure yet what we will do with them

> But in my opinion EMC should be the only 
> MC, enabled by default everywhere, with ruby-like hooks to augment its 
> behavior and no option for replacement. Then you're in a far better 
> position to install more optimistic optimizations.

this is planed.. there are some things in EMC making live a bit 
difficult. Normally in MetaClassImpl, each and every Metaclass is a 
replaceable and sole construct. One MetaClass does not another to select 
a method. This has the advantage of being able to remove and recreate a 
MetaClass on demand, for example if memory is low. EMC is not 
collectable. Also EMC might do lookups to the parent... well, we have to 
rework these parts and see if it is ok to have the parents or not, and 
if it is worth splitting the MetaClass in a part that can be collected 
and one that can't because it contains user made modifications

>> I think this approach will allow a narrow API, with the core selecting 
>> the method, but not executing them. The actual call structure will be 
>> shallow and caching can be done at lots of places
>>
>> We plan on doing so too.. But only for a few cases that can be expected. 
>> In fact in Groovy the user can give type information, so if he does we 
>> can use that to predict methods and their result types. I plan such 
>> actions also for calls to private methods. This way the bytecode won't 
>> be that bloated
> 
> You'd be surprised. How big does is a typical Groovy method in bytecode 
> right now? I'd wager a substantial portion of that is call 
> overhead...can you afford to double the size of some subset of operations?

If the tests show that it is not faster, then we won't do it.

> Here's a simple JRuby fib method, minus about 15 bytecodes worth of 
> preamble:
> 
> public org.jruby.runtime.builtin.IRubyObject 
> method__0$RUBY$fib_ruby(org.jruby.runtime.ThreadContext, 
> org.jruby.runtime.builtin.IRubyObject, 
> org.jruby.runtime.builtin.IRubyObject[], org.jruby.runtime.Block);
>    Code:
> .... preamble ....
>     45:       aload_1
>     46:       iconst_3
>     47:       invokestatic    #40; //Method 
> setPosition:(Lorg/jruby/runtime/ThreadContext;I)V
>     50:       aload_0
>     51:       getfield        #89; //Field site1:Lorg/jruby/runtime/CallSite;
>     54:       aload_1
>     55:       aload   11
>     57:       aload   6
>     59:       invokestatic    #95; //Method 
> org/jruby/RubyFixnum.two:(Lorg/jruby/Ruby;)Lorg/jruby/RubyFixnum;
>     62:       invokevirtual   #74; //Method 
> org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
>     65:       invokeinterface #101,  1; //InterfaceMethod 
> org/jruby/runtime/builtin/IRubyObject.isTrue:()Z
>     70:       ifeq    83
>     73:       aload_1
>     74:       iconst_4
>     75:       invokestatic    #40; //Method 
> setPosition:(Lorg/jruby/runtime/ThreadContext;I)V
>     78:       aload   11
>     80:       goto    145
>     83:       aload_1
>     84:       bipush  6
>     86:       invokestatic    #40; //Method 
> setPosition:(Lorg/jruby/runtime/ThreadContext;I)V
>     89:       aload_0
>     90:       getfield        #106; //Field site2:Lorg/jruby/runtime/CallSite;
>     93:       aload_1
>     94:       aload_0
>     95:       getfield        #111; //Field site3:Lorg/jruby/runtime/CallSite;
>     98:       aload_1
>     99:       aload_2
>     100:      aload_0
>     101:      getfield        #116; //Field site4:Lorg/jruby/runtime/CallSite;
>     104:      aload_1
>     105:      aload   11
>     107:      aload   6
>     109:      invokestatic    #95; //Method 
> org/jruby/RubyFixnum.two:(Lorg/jruby/Ruby;)Lorg/jruby/RubyFixnum;
>     112:      invokevirtual   #74; //Method 
> org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
>     115:      invokevirtual   #74; //Method 
> org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
>     118:      aload_0
>     119:      getfield        #119; //Field site5:Lorg/jruby/runtime/CallSite;
>     122:      aload_1
>     123:      aload_2
>     124:      aload_0
>     125:      getfield        #122; //Field site6:Lorg/jruby/runtime/CallSite;
>     128:      aload_1
>     129:      aload   11
>     131:      aload   6
>     133:      invokestatic    #125; //Method 
> org/jruby/RubyFixnum.one:(Lorg/jruby/Ruby;)Lorg/jruby/RubyFixnum;
>     136:      invokevirtual   #74; //Method 
> org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
>     139:      invokevirtual   #74; //Method 
> org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
>     142:      invokevirtual   #74; //Method 
> org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
>     145:      areturn
> 
> Now this bytecode is pretty tight. There are some special-case methods 
> for Fixnum 1 and 2, CallSite objects to encapsulate some boilerplate 
> call-wrapping logic, and "setPosition" calls to update the Ruby stack 
> trace, but otherwise we've managed to boil it down a lot. And it's still 
> a lot of code. I've been doing a bytecode audit recently to make sure 
> all bytecode generated is as clean as possible, and this is the result 
> at the moment (trunk code). What's a comparable fib method in Groovy 
> look like with the new call site stuff?

def fib(n){
   if (n<2) return 1
   return fib(n-1)+fib(n-2)
}

>   public fib(Ljava/lang/Object;)Ljava/lang/Object;
>     TRYCATCHBLOCK L0 L1 L1 groovy/lang/GroovyRuntimeException
>    L0
>     INVOKESTATIC fib.$getCallSiteArray 
> ()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
>     ASTORE 2
>    L2
>     LINENUMBER 2 L2
>     ALOAD 1
>     GETSTATIC fib.$const$0 : Ljava/lang/Integer;
>     INVOKESTATIC 
> org/codehaus/groovy/runtime/ScriptBytecodeAdapter.compareLessThan 
> (Ljava/lang/Object;Ljava/lang/Object;)Z
>     IFEQ L3
>    L4
>     LINENUMBER 2 L4
>     GETSTATIC fib.$const$1 : Ljava/lang/Integer;
>     ARETURN
>     GOTO L3
>    L3
>     LINENUMBER 3 L3
>     ALOAD 2
>     LDC 1
>     AALOAD
>    L5
>     LINENUMBER 3 L5
>     ALOAD 2
>     LDC 2
>     AALOAD
>     ALOAD 0
>    L6
>     LINENUMBER 3 L6
>     ALOAD 2
>     LDC 3
>     AALOAD
>     ALOAD 1
>     GETSTATIC fib.$const$1 : Ljava/lang/Integer;
>     INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callBinop 
> (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>     INVOKESTATIC org/codehaus/groovy/runtime/ArrayUtil.createArray 
> (Ljava/lang/Object;)[Ljava/lang/Object;
>     INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callCurrent 
> (Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>    L7
>     LINENUMBER 3 L7
>     ALOAD 2
>     LDC 4
>     AALOAD
>     ALOAD 0
>    L8
>     LINENUMBER 3 L8
>     ALOAD 2
>     LDC 5
>     AALOAD
>     ALOAD 1
>     GETSTATIC fib.$const$0 : Ljava/lang/Integer;
>     INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callBinop 
> (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>     INVOKESTATIC org/codehaus/groovy/runtime/ArrayUtil.createArray 
> (Ljava/lang/Object;)[Ljava/lang/Object;
>     INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callCurrent 
> (Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>     INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callBinop 
> (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>     ARETURN
>    L9
>     GOTO L10
>    L1
>     INVOKESTATIC org/codehaus/groovy/runtime/ScriptBytecodeAdapter.unwrap 
> (Lgroovy/lang/GroovyRuntimeException;)Ljava/lang/Throwable;
>     ATHROW
>    L10
>     NOP
>     LOCALVARIABLE this Lfib; L0 L9 0
>     LOCALVARIABLE n Ljava/lang/Object; L0 L9 1
>     MAXSTACK = 7
>     MAXLOCALS = 3

that's around 67 lines, If I remove the labels and line number entries 
as well as the epiloge I get: 45 lines/instructions

> Of course I'm not saying to go for it...I'm going to try do the same 
> thing with profiling data gathered during interpretation, if I can find 
> a reasonable way to shrink the bytecode duplication to a reasonable 
> level. But I think tricks that depend on type annotations are really not 
> in the spirit of the language...and if possible I would help you explore 
> ways to optimize normal dynamic invocation more first, because I think 
> that's where the most generally applicable gains are going to come from.

I think with call site caching we do quit good already. I see more a 
problem in the continous boxing actions... For example, same method, but 
this time with ints:

int fib(int n){
   if (n<2) return 1
   return fib(n-1)+fib(n-2)
}

will contain code making the parameter into an int:

> INVOKESTATIC 
> org/codehaus/groovy/runtime/typehandling/DefaultTypeTransformation.box 
> (I)Ljava/lang/Object

and code to transform the value:

> INVOKESTATIC fib.$get$$class$java$lang$Integer ()Ljava/lang/Class;
> INVOKESTATIC org/codehaus/groovy/runtime/ScriptBytecodeAdapter.castToType 
> (Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object;
> CHECKCAST java/lang/Integer
> INVOKESTATIC 
> org/codehaus/groovy/runtime/typehandling/DefaultTypeTransformation.intUnbox 
> (Ljava/lang/Object;)I
> IRETURN

of course two times, because we have two returns. Even for this "return 
1" we do first create an Integer and then unbox it. Well, even if the 
Integer object is cached it still is something we can optimize away.

Would be the code like this:

int fib(int n){
   if (n<2) return 1
   int a = fib(n-1)
   int b = fib(n-2)
   return a+b
}

the we would have two more casts, but no unboxing, since we do not do so 
  for local variables.

>> to say the truth, Groovy is fast enough for me, even if it is sometimes 
>> 5-100 times slower than Java. It is quite easy to get the speed very 
>> much up. But a language is not only about what the implementors want and 
>> a community driven language like Groovy especially not. Groovy is no 
>> academic language where you write papers when you have a good idea. 
>> Instead a language is also much about politics, and if the public 
>> demands more speed, then we will do our best. Also here are people 
>> afraid of dynamic languages and we need o show them, that they don't 
>> need to be slow, just because they are dynamic
> ...
>> Well, in a benchmark like the Alioth Shootout you are not allowed to use 
>> this obvious solution. That gives bad press. And since a language is so 
>> much about politics, you have to handle bad press somehow
> 
> I hate having to worry about performance, but I love optimizing it. The 
> world is far too performance obsessed, but there are reasons for it. I 
> would strongly caution against optimizations designed to make specific 
> benchmarks fast, even if the political gains would be substantial. Ruby 
> 1.9 added fast-path Fixnum math operators and ended up looking great on 
> a lot of benchmarks. Then more and more complaints started to come in 
> that they resulted in slowing down *everything non-Fixnum type* because 
> of the extra typechecking involved.

sure, but still our "dream" is to be able to support native primtive 
base operations. We will keep an eye on what will happen to other code.

bye blackdrag

-- 
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/
http://www.g2one.com/

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---

[jvm-l] Re: avoiding boxing

Reply via email to