First about the bug that classes were loaded only once: This did not
happen in my case for the GroovyFilling because I used different
directories for ClassGCTester and GroovyFilling, so the parent of the
URLClassLoader could not load the test class, but it happened
(unintentionally) in my tests for JavaFilling which is why I though that
loading Java classes was so incredibly fast... :(
With that fixed, I got much more reasonable output, about
0.1ms+0.02ms=0.12ms for loading JavaFilling and about 0.4ms + 0.9ms =
1.3ms for the original GroovyFilling. So, there is not all that much of
room for improvement, at least a lot less than I feared, considering
that Groovy classes do have more capabilities.
$ java -cp . ClassGCTester filling/ JavaFilling
Secs Test classes Metaspace/PermGen Heap Load time Create
time
#loaded #remaining used committed used
committed average average
0 1 1 2.9m 4.8m 2.6m 245.5m
0.812ms 0.098ms
1 5577 2803 7.4m 21.3m 37.9m 177.5m
0.136ms 0.028ms
2 13610 3437 8.2m 25.9m 45.6m 361.0m
0.114ms 0.024ms
3 21579 2836 7.4m 24.0m 38.3m 513.5m
0.109ms 0.023ms
4 29582 2269 6.7m 21.8m 31.6m 733.5m
0.106ms 0.022ms
5 37792 1909 6.3m 20.5m 28.7m 990.5m
0.104ms 0.022ms
6 45894 1441 5.7m 18.6m 22.3m 1257.0m
0.103ms 0.022ms
7 54049 1026 5.2m 17.6m 19.1m 1575.5m
0.102ms 0.021ms
$ java -XX:MaxMetaspaceSize=64m -Xmx512m -Dgroovy.use.classvalue=true
-cp .:groovy-2.4.6.jar ClassGCTester filling/ GroovyFilling
Secs Test classes Metaspace/PermGen Heap Load time Create
time
#loaded #remaining used committed used
committed average average
0 1 1 6.3m 6.5m 13.4m 245.5m
0.875ms 11.525ms
1 474 474 9.0m 10.3m 36.5m 245.5m
0.350ms 1.694ms
2 1312 1312 12.2m 15.1m 103.7m 309.5m
0.265ms 1.224ms
3 2291 2291 16.0m 21.0m 87.4m 389.0m
0.417ms 1.034ms
4 2977 2977 18.6m 25.1m 178.4m 389.0m
0.360ms 0.962ms
5 4065 4065 22.8m 31.5m 216.0m 389.0m
0.307ms 0.905ms
6 4641 4641 25.0m 34.9m 164.0m 455.5m
0.444ms 0.892ms
7 5314 5314 27.6m 38.8m 213.2m 455.5m
0.412ms 0.888ms
What I still observed was that for loading JavaFilling, Metaspace does
not grow indefinitely even without a limit (see above), but it does for
GroovyFilling, which I can also understand. Would be nice if it was
possible that Groovy classes were also collected so quickly, but in
practice I guess once you know that, you just have to set a reasonable
maximum for Metaspace, and when you operate a "server environment" you
have to take care of these kinds of things anyway, I would say.
So, essentially I am quite happy with the results and with how Groovy
fares :)
I have made a "release" 1.1.0 of ClassGCTester with an added check that
the test class cannot be loaded from the classpath of ClassGCTester
alone and with a fix for the display of Metaspace/PermGen (this matches
now roughly the output of "jstat -gc ..." for MC and MU, resp. the
equivalents for PermGen), plus updated the readme and examples.
So far for the "broad picture", some feedback regarding your detailed
analysis hopefully a bit later, I think I will first read it again a few
more times - very interesting... :)
Alain
On 13.05.16 13:03, Jochen Theodorou wrote:
On 13.05.2016 02:22, Alain Stalder wrote:
[...]
Qualitatively this often has the following result in the Java VM:
Metaspace resp. PermGen, and Heap in parallel, just grow until a
configured limit is reached (and note that there is none by default for
Metaspace in Java 8 and later), often only then is it garbage collected.
With Java classes, at least with simple ones, this looks often
different, those appear to be garbage collected much more quickly.
Another qualitative difference is that loading a Groovy class and
instantiating it seems typically to be considerably slower than
instantiating a Java class with similar functionality, even quite
drastically so, more than one would expect even considering the need to
create metadata for dynamic function calls etc.
At least that has been my experience over the past few years.
this is going to be a long mail ;)
so let us make three things to discuss here:
1) Object initialization performance
2) class verification/complexity
3) garbage collection of unused classes
And we have to distinguish here between usages of ClassValue,
invokedynamic and the traditional version of those... that makes 4
aspects.
So I will write several things you probably already know, but other
reading here might not. And even though I simplify a bit, please bear
with me ;) And first of all, let us talk about the meta class
system... that mostly targets ClassValue then.
The old version of the meta class system uses a big global table for
all meta classes, with a class key and ClassInfo as value. In
ClassInfo we have the meta class, which might be either recreate-able
(in which case the meta class is soft referenced) or not (in which
case it is a strong reference). The idea being, that as soon as the
class can be garbage collected, the ClassInfo can as well, and with it
the meta class.
Problem 1 here is, the meta class holds a strong reference to the
class, so if the ClassInfo holds a strong reference to the meta class,
this entry in the table will never be collected. I mention this only
for completeness, since you did not set a permanent meta class in your
test
Problem 2 here is, the code is concurrent, which rules out WeakHashMap
and forced us to implement our own map.
In the ClassValue version we do not have our own table anymore and let
the JVM manage this.
To avoid the lookup cost of the meta class, every Groovy class has a
reference to its ClassInfo. The meta class and ClassInfo are lazy
initialized.... well, "populated with actual data" in case of ClassInfo.
Some classes extend GroovyObjectSupport, which does the initialization
in the constructor already, groovy.lang.Script is one of them. That
means every time you create an instance of a new script class, you get
the meta class already, even if the meta class is not used. let us
have a small look at the bytecode of the constructors (x.groovy which
only returns 42) of such a script:
// access flags 0x1
public <init>()V
ALOAD 0
INVOKESPECIAL groovy/lang/Script.<init> ()V
L0
INVOKESTATIC x.$getCallSiteArray
()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
ASTORE 1
L1
RETURN
// access flags 0x1
public <init>(Lgroovy/lang/Binding;)V
L0
INVOKESTATIC x.$getCallSiteArray
()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
ASTORE 2
ALOAD 0
ALOAD 1
INVOKESPECIAL groovy/lang/Script.<init> (Lgroovy/lang/Binding;)V
L1
RETURN
as you can see here, there is a getCallSiteArray call in here and
direct methods calls for initialization. The getCallSiteArray call in
here is actually surplus, but it is difficult to decide in the
compiler early on if we will need it or not, because the callsite
array here is basically an array wrapper, which supplies method names
for method calls as well as specialized code for doing dynamic method
calls. Would the constructor for example contain a method call to a
method foo(), you would see some thing like getting the CallSite and
executing "call" on it.
Why are we doing static method calls in the constructor here? Because
in several cases the compiler optimizes the dynamic call away here.
Basically you cannot provide a super constructor in Groovy, which
means only the one statically defined does count. And as long as the
given types are matching enough for the compiler to decide, we can
create direct method calls.
So what does this mean for the object initialization performance in
cases of Scripts so far? We eagerly initialize ClassInfo and MetaClass
for each script. That means a lookup by reflection of the complete
structure of the class and its super classes... which is cached, so
the super class lookup will be much lower the next time. initializing
the first meta class will also initialize the extension method lookup,
which can also take a bit time... again, a one-time cost here.
The invokedynamic version will again use the jdk internals for the
callsites, thus $getCallSiteArray is never called nor are the methods
around this created. This saves bytecode size (faster verification),
and in fact the two mentioned constructors above would look just like
they would in Java. This of course still implies the meta class system
init.
Now... the actual code of the script is to be found in the run method,
which looks like this in old groovy:
// access flags 0x1
public run()Ljava/lang/Object;
L0
INVOKESTATIC x.$getCallSiteArray
()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
ASTORE 1
L1
LINENUMBER 1 L1
BIPUSH 42
INVOKESTATIC java/lang/Integer.valueOf (I)Ljava/lang/Integer;
ARETURN
L2
ACONST_NULL
ARETURN
again we see the call to get the callsite array. But otherwise the
method should like this in Java too. Since we are not calling any
actual methods and just return 42, we have again needless overhead in
this. Accordingly the invokedynamic version looks like this:
// access flags 0x1
public run()Ljava/lang/Object;
L0
LINENUMBER 1 L0
BIPUSH 42
INVOKESTATIC java/lang/Integer.valueOf (I)Ljava/lang/Integer;
ARETURN
L1
FRAME FULL [] [java/lang/Throwable]
NOP
ATHROW
Frames appear in this one, because the old groovy bytecode is java5
compatible, while invokedynamic requires java7 and thus the new faster
verifier with frames. The frames basically support the verifier to
make it faster.... I never tested if that is actually the case.
To sum it up a little so far.... the invokedynamic version uses less
bytecode and the new verifier 1685 vs 2140, both could execute that
script without ever using the meta class system, if run is called
directly and if not Script would initialize the meta class eagerly. Of
course the later is for nil as soon as you have some method calls in
there, which are not static.
Since we now have some basics, we should start talking about your test
case.
First thing... I don´t think you load the class new every time. (Taken
from my machine, with java8u25 and Groovy 2.4.6)
0 1 1 10.7m 11.4m 14.4m
240.0m 0.105ms 0.061ms
1 760081 1 11.4m 11.8m 28.3m
246.5m 0.000ms 0.000ms
I see here the first time class loading with 105 and a creation time
with 61. This class loading time has to appear every time and must
never be 0. But that is the case. So even if you create a new
URLClassLoader without defining a root, it still has the system class
loader as parent. Providing null as parent class loader fixes that
problem. And then I got a more interesting:
0 1 1 10.6m 11.4m 14.4m
240.0m 0.213ms 0.063ms
1 11519 2227 16.2m 35.9m 24.2m
246.0m 0.060ms 0.022ms
2 27103 4055 19.4m 46.8m 40.7m
229.0m 0.051ms 0.019ms
3 42465 1 12.2m 34.4m 4.5m
235.0m 0.049ms 0.019ms
4 57606 580 13.3m 34.4m 9.0m
236.0m 0.048ms 0.018ms
5 73506 1918 15.8m 39.1m 21.3m
236.0m 0.047ms 0.018ms
6 89212 2482 16.8m 43.0m 25.8m
212.5m 0.047ms 0.018ms
7 104578 1 12.4m 41.7m 4.0m
232.0m 0.046ms 0.018ms
actually this was with java5 compatible code... with java8 standard I
get something like this:
0 1 1 10.6m 11.4m 14.4m
240.0m 0.214ms 0.029ms
1 12117 2861 17.3m 38.4m 30.4m
246.0m 0.062ms 0.016ms
2 29143 1303 14.6m 36.6m 15.8m
236.0m 0.052ms 0.014ms
3 46633 4267 19.8m 48.3m 43.2m
235.0m 0.048ms 0.013ms
4 62634 900 14.0m 35.2m 11.4m
236.0m 0.048ms 0.013ms
5 80618 4358 20.1m 48.7m 43.5m
235.5m 0.046ms 0.013ms
6 97964 2336 16.6m 41.0m 25.3m
236.5m 0.046ms 0.012ms
This makes me actually wondering a bit as of why the creation time
goes down, but not the load time. And after some thinking things
become clear to me...when you load a class using loadClass, the class
is not verified immediately. This happens lazy on actual static
initialization or once you ask the class for its structure... like
when using getMethods. So what load time tests here is purely the
class loader performance, nothing else. A difference in load time
would thus indicate solely that more or less classes are used. This
also means the creation time here includes the verification time of
the class, which I guess amounts to the .006 in difference between the
two.
Now lets get further and change the simple java class with a run
method to extend Groovy´s script class.. well this requires further
changes, since now of course giving the groovy jar on the classpath
will no longer enable the spawned URLClassloader to load groovy
classes. Anyway... after fixing that:
0 1 1 13.4m 13.8m 24.0m
240.0m 1.290ms 90.612ms
1 12 12 39.3m 40.1m 64.3m
190.0m 0.985ms 82.673ms
2 23 23 65.0m 66.5m 56.0m
246.0m 1.024ms 88.072ms
3 34 5 29.3m 46.8m 27.6m
230.5m 0.976ms 88.245ms
4 48 19 54.8m 64.0m 39.2m
228.0m 0.929ms 83.359ms
5 60 1 19.6m 51.3m 13.2m
228.0m 0.892ms 84.574ms
6 73 14 42.6m 55.8m 36.8m
228.0m 0.859ms 82.042ms
...
46 593 1 21.8m 62.2m 7.9m
220.5m 0.783ms 76.878ms
47 607 15 47.0m 62.3m 41.8m
219.0m 0.787ms 76.695ms
48 621 29 78.3m 82.5m 44.5m
220.5m 0.782ms 76.563ms
nothing much more will happen. I guess having a fast SSD is paying off
here, creation times would have been 1s+ otherwise. Anyway... as you
can see by the load time, we load a ton more classes now and creation
is much more expensive, since we now load part of the meta class
system. Without any further changes, this would be the best Groovy can
achieve without actually changing the way scripts are done. But as you
can see as well, the memory behaviour is actually quite well. I would
have liked to test the ClassValue version, but it crashes almost
immediately after my changes.
Next step is then to use the simple "42" script:
...
46 596 17 52.0m 63.7m 36.9m
219.5m 0.840ms 76.407ms
47 609 1 21.7m 60.7m 11.5m
223.0m 0.835ms 76.536ms
48 621 13 43.0m 60.7m 27.0m
223.0m 0.830ms 76.434ms
we now load even more classes like CallSite and such, but creation
times are about the same meaning the overhead here is almost zero in
the end. Unsurprisingly the invokedynamic version behaves about the
same. Maybe the load times are a bit less good... which indicates more
classes being loaded internally in the JDK code to support invokedynamic.
Now a further step... I can change the script from being "42" to:
public class GroovyFilling {
public Object run() {
return 42;
}
}
so basically the same as the java version... This means Script will no
longer be used here!
46 638 28 74.9m 79.6m 50.7m
227.5m 0.626ms 71.473ms
47 650 10 35.4m 60.3m 41.1m
227.5m 0.631ms 71.630ms
48 665 25 68.4m 74.6m 47.4m
227.5m 0.626ms 71.512ms
well... we seem to load less classes, but we still get quite high
creation times. looking at the bytecode I find this (for the indy
version, but that does not really matter):
// access flags 0x1
public <init>()V
ALOAD 0
INVOKESPECIAL java/lang/Object.<init> ()V
L0
ALOAD 0
INVOKEVIRTUAL GroovyFilling.$getStaticMetaClass
()Lgroovy/lang/MetaClass;
ASTORE 1
ALOAD 1
ALOAD 0
SWAP
PUTFIELD GroovyFilling.metaClass : Lgroovy/lang/MetaClass;
ALOAD 1
POP
L1
RETURN
So we actually do an eager meta class creation here as well. The worst
part here is... we get this even with static compilation... which
absolutely is a bug.
So to improve this test here I guess trying to get rid of
$getStaticMetaClass and making Script (maybe even GroovyObjectSupport)
init meta classes lazy would be a gain. In the times of pre Groovy 1.8
this would not have amassed to much gain, since we did dynamic calls
all other the place. But since then we managed to reduce those calls
quite a lot, so I see an actual potential to reduce creation times for
simple cases. Of course in the more complex ones or in case in which
the super class constructor cannot be chosen statically by the
compiler, all bets are off again.
bye Jochen
.