Re: Improve Groovy class loading performance and memory management

Jochen Theodorou Fri, 13 May 2016 04:04:58 -0700

On 13.05.2016 02:22, Alain Stalder wrote:
[...]

Qualitatively this often has the following result in the Java VM:
Metaspace resp. PermGen, and Heap in parallel, just grow until a
configured limit is reached (and note that there is none by default for
Metaspace in Java 8 and later), often only then is it garbage collected.
With Java classes, at least with simple ones, this looks often
different, those appear to be garbage collected much more quickly.


Another qualitative difference is that loading a Groovy class and
instantiating it seems typically to be considerably slower than
instantiating a Java class with similar functionality, even quite
drastically so, more than one would expect even considering the need to
create metadata for dynamic function calls etc.

At least that has been my experience over the past few years.


this is going to be a long mail ;)

so let us make three things to discuss here:

1) Object initialization performance
2) class verification/complexity
3) garbage collection of unused classes

And we have to distinguish here between usages of ClassValue,invokedynamic and the traditional version of those... that makes 4 aspects.

So I will write several things you probably already know, but otherreading here might not. And even though I simplify a bit, please bearwith me ;) And first of all, let us talk about the meta class system...that mostly targets ClassValue then.

The old version of the meta class system uses a big global table for allmeta classes, with a class key and ClassInfo as value. In ClassInfo wehave the meta class, which might be either recreate-able (in which casethe meta class is soft referenced) or not (in which case it is a strongreference). The idea being, that as soon as the class can be garbagecollected, the ClassInfo can as well, and with it the meta class.

Problem 1 here is, the meta class holds a strong reference to the class,so if the ClassInfo holds a strong reference to the meta class, thisentry in the table will never be collected. I mention this only forcompleteness, since you did not set a permanent meta class in your test

Problem 2 here is, the code is concurrent, which rules out WeakHashMapand forced us to implement our own map.

In the ClassValue version we do not have our own table anymore and letthe JVM manage this.

To avoid the lookup cost of the meta class, every Groovy class has areference to its ClassInfo. The meta class and ClassInfo are lazyinitialized.... well, "populated with actual data" in case of ClassInfo.

Some classes extend GroovyObjectSupport, which does the initializationin the constructor already, groovy.lang.Script is one of them. Thatmeans every time you create an instance of a new script class, you getthe meta class already, even if the meta class is not used. let us havea small look at the bytecode of the constructors (x.groovy which onlyreturns 42) of such a script:

  // access flags 0x1
  public <init>()V
    ALOAD 0
    INVOKESPECIAL groovy/lang/Script.<init> ()V
   L0
    INVOKESTATIC x.$getCallSiteArray 
()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
    ASTORE 1
   L1
    RETURN

  // access flags 0x1
  public <init>(Lgroovy/lang/Binding;)V
   L0
    INVOKESTATIC x.$getCallSiteArray 
()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
    ASTORE 2
    ALOAD 0
    ALOAD 1
    INVOKESPECIAL groovy/lang/Script.<init> (Lgroovy/lang/Binding;)V
   L1
    RETURN

as you can see here, there is a getCallSiteArray call in here and directmethods calls for initialization. The getCallSiteArray call in here isactually surplus, but it is difficult to decide in the compiler early onif we will need it or not, because the callsite array here is basicallyan array wrapper, which supplies method names for method calls as wellas specialized code for doing dynamic method calls. Would theconstructor for example contain a method call to a method foo(), youwould see some thing like getting the CallSite and executing "call" on it.

Why are we doing static method calls in the constructor here? Because inseveral cases the compiler optimizes the dynamic call away here.Basically you cannot provide a super constructor in Groovy, which meansonly the one statically defined does count. And as long as the giventypes are matching enough for the compiler to decide, we can createdirect method calls.

So what does this mean for the object initialization performance incases of Scripts so far? We eagerly initialize ClassInfo and MetaClassfor each script. That means a lookup by reflection of the completestructure of the class and its super classes... which is cached, so thesuper class lookup will be much lower the next time. initializing thefirst meta class will also initialize the extension method lookup, whichcan also take a bit time... again, a one-time cost here.

The invokedynamic version will again use the jdk internals for thecallsites, thus $getCallSiteArray is never called nor are the methodsaround this created. This saves bytecode size (faster verification), andin fact the two mentioned constructors above would look just like theywould in Java. This of course still implies the meta class system init.

Now... the actual code of the script is to be found in the run method,which looks like this in old groovy:

  // access flags 0x1
  public run()Ljava/lang/Object;
   L0
    INVOKESTATIC x.$getCallSiteArray 
()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
    ASTORE 1
   L1
    LINENUMBER 1 L1
    BIPUSH 42
    INVOKESTATIC java/lang/Integer.valueOf (I)Ljava/lang/Integer;
    ARETURN
   L2
    ACONST_NULL
    ARETURN

again we see the call to get the callsite array. But otherwise themethod should like this in Java too. Since we are not calling any actualmethods and just return 42, we have again needless overhead in this.Accordingly the invokedynamic version looks like this:

  // access flags 0x1
  public run()Ljava/lang/Object;
   L0
    LINENUMBER 1 L0
    BIPUSH 42
    INVOKESTATIC java/lang/Integer.valueOf (I)Ljava/lang/Integer;
    ARETURN
   L1
   FRAME FULL [] [java/lang/Throwable]
    NOP
    ATHROW

Frames appear in this one, because the old groovy bytecode is java5compatible, while invokedynamic requires java7 and thus the new fasterverifier with frames. The frames basically support the verifier to makeit faster.... I never tested if that is actually the case.

To sum it up a little so far.... the invokedynamic version uses lessbytecode and the new verifier 1685 vs 2140, both could execute thatscript without ever using the meta class system, if run is calleddirectly and if not Script would initialize the meta class eagerly. Ofcourse the later is for nil as soon as you have some method calls inthere, which are not static.


Since we now have some basics, we should start talking about your test case.

First thing... I don´t think you load the class new every time. (Takenfrom my machine, with java8u25 and Groovy 2.4.6)

   0         1           1      10.7m      11.4m        14.4m    240.0m     
0.105ms     0.061ms
   1    760081           1      11.4m      11.8m        28.3m    246.5m     
0.000ms     0.000ms

I see here the first time class loading with 105 and a creation timewith 61. This class loading time has to appear every time and must neverbe 0. But that is the case. So even if you create a new URLClassLoaderwithout defining a root, it still has the system class loader as parent.Providing null as parent class loader fixes that problem. And then I gota more interesting:

   0         1           1      10.6m      11.4m        14.4m    240.0m     
0.213ms     0.063ms
   1     11519        2227      16.2m      35.9m        24.2m    246.0m     
0.060ms     0.022ms
   2     27103        4055      19.4m      46.8m        40.7m    229.0m     
0.051ms     0.019ms
   3     42465           1      12.2m      34.4m         4.5m    235.0m     
0.049ms     0.019ms
   4     57606         580      13.3m      34.4m         9.0m    236.0m     
0.048ms     0.018ms
   5     73506        1918      15.8m      39.1m        21.3m    236.0m     
0.047ms     0.018ms
   6     89212        2482      16.8m      43.0m        25.8m    212.5m     
0.047ms     0.018ms
   7    104578           1      12.4m      41.7m         4.0m    232.0m     
0.046ms     0.018ms

actually this was with java5 compatible code... with java8 standard Iget something like this:

   0         1           1      10.6m      11.4m        14.4m    240.0m     
0.214ms     0.029ms
   1     12117        2861      17.3m      38.4m        30.4m    246.0m     
0.062ms     0.016ms
   2     29143        1303      14.6m      36.6m        15.8m    236.0m     
0.052ms     0.014ms
   3     46633        4267      19.8m      48.3m        43.2m    235.0m     
0.048ms     0.013ms
   4     62634         900      14.0m      35.2m        11.4m    236.0m     
0.048ms     0.013ms
   5     80618        4358      20.1m      48.7m        43.5m    235.5m     
0.046ms     0.013ms
   6     97964        2336      16.6m      41.0m        25.3m    236.5m     
0.046ms     0.012ms

This makes me actually wondering a bit as of why the creation time goesdown, but not the load time. And after some thinking things become clearto me...when you load a class using loadClass, the class is not verifiedimmediately. This happens lazy on actual static initialization or onceyou ask the class for its structure... like when using getMethods. Sowhat load time tests here is purely the class loader performance,nothing else. A difference in load time would thus indicate solely thatmore or less classes are used. This also means the creation time hereincludes the verification time of the class, which I guess amounts tothe .006 in difference between the two.

Now lets get further and change the simple java class with a run methodto extend Groovy´s script class.. well this requires further changes,since now of course giving the groovy jar on the classpath will nolonger enable the spawned URLClassloader to load groovy classes.Anyway... after fixing that:

   0         1           1      13.4m      13.8m        24.0m    240.0m     
1.290ms    90.612ms
   1        12          12      39.3m      40.1m        64.3m    190.0m     
0.985ms    82.673ms
   2        23          23      65.0m      66.5m        56.0m    246.0m     
1.024ms    88.072ms
   3        34           5      29.3m      46.8m        27.6m    230.5m     
0.976ms    88.245ms
   4        48          19      54.8m      64.0m        39.2m    228.0m     
0.929ms    83.359ms
   5        60           1      19.6m      51.3m        13.2m    228.0m     
0.892ms    84.574ms
   6        73          14      42.6m      55.8m        36.8m    228.0m     
0.859ms    82.042ms

...

  46       593           1      21.8m      62.2m         7.9m    220.5m     
0.783ms    76.878ms
  47       607          15      47.0m      62.3m        41.8m    219.0m     
0.787ms    76.695ms
  48       621          29      78.3m      82.5m        44.5m    220.5m     
0.782ms    76.563ms

nothing much more will happen. I guess having a fast SSD is paying offhere, creation times would have been 1s+ otherwise. Anyway... as you cansee by the load time, we load a ton more classes now and creation ismuch more expensive, since we now load part of the meta class system.Without any further changes, this would be the best Groovy can achievewithout actually changing the way scripts are done. But as you can seeas well, the memory behaviour is actually quite well. I would have likedto test the ClassValue version, but it crashes almost immediately aftermy changes.


Next step is then to use the simple "42" script:
...

  46       596          17      52.0m      63.7m        36.9m    219.5m     
0.840ms    76.407ms
  47       609           1      21.7m      60.7m        11.5m    223.0m     
0.835ms    76.536ms
  48       621          13      43.0m      60.7m        27.0m    223.0m     
0.830ms    76.434ms

we now load even more classes like CallSite and such, but creation timesare about the same meaning the overhead here is almost zero in the end.Unsurprisingly the invokedynamic version behaves about the same. Maybethe load times are a bit less good... which indicates more classes beingloaded internally in the JDK code to support invokedynamic.


Now a further step... I can change the script from being "42" to:

public class GroovyFilling {
  public Object run() {
    return 42;
  }
}

so basically the same as the java version... This means Script will nolonger be used here!

  46       638          28      74.9m      79.6m        50.7m    227.5m     
0.626ms    71.473ms
  47       650          10      35.4m      60.3m        41.1m    227.5m     
0.631ms    71.630ms
  48       665          25      68.4m      74.6m        47.4m    227.5m     
0.626ms    71.512ms

well... we seem to load less classes, but we still get quite highcreation times. looking at the bytecode I find this (for the indyversion, but that does not really matter):

  // access flags 0x1
  public <init>()V
    ALOAD 0
    INVOKESPECIAL java/lang/Object.<init> ()V
   L0
    ALOAD 0
    INVOKEVIRTUAL GroovyFilling.$getStaticMetaClass ()Lgroovy/lang/MetaClass;
    ASTORE 1
    ALOAD 1
    ALOAD 0
    SWAP
    PUTFIELD GroovyFilling.metaClass : Lgroovy/lang/MetaClass;
    ALOAD 1
    POP
   L1
    RETURN

So we actually do an eager meta class creation here as well. The worstpart here is... we get this even with static compilation... whichabsolutely is a bug.

So to improve this test here I guess trying to get rid of$getStaticMetaClass and making Script (maybe even GroovyObjectSupport)init meta classes lazy would be a gain. In the times of pre Groovy 1.8this would not have amassed to much gain, since we did dynamic calls allother the place. But since then we managed to reduce those calls quite alot, so I see an actual potential to reduce creation times for simplecases. Of course in the more complex ones or in case in which the superclass constructor cannot be chosen statically by the compiler, all betsare off again.


bye Jochen

Re: Improve Groovy class loading performance and memory management

Reply via email to