RFR: JDK-8031043 ClassValue's backing map should have a smaller initial size

Peter Levart Thu, 09 Jun 2016 07:45:17 -0700

Hi,

The patch for this issue is a result of the following discussion onmlvm-dev list:


http://mail.openjdk.java.net/pipermail/mlvm-dev/2016-May/006602.html

I propose the following improvement to the implementation of ClassValue API:

http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3/

This is a re-implementation of ClassValue API that leverages aConcurrentMap for storage. ConcurrentHashMap was initially tested, butusing it regressed lookup performance for about ~10%. So a simpleConcurrentMap implementation (LinearProbeHashtable) was created whichhas smaller footprint, better CPU cache locality and consequently fasterlookup performance for the task at hand. The goal of this task was toshrink the memory footprint of ClassValue API. To see the end effectlet's first look at the results of a benchmark that simulates are-deployment of an application in an app server where several apps aredeployed and each uses it's own set of classes and ClassValue(s):


http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueExpungeBench.java

Benchmark (classValuesPerPart)(classesPerPart) (impl) (partitions) Mode Cnt Score Error UnitsClassValueExpungeBench.redeployPartition 8 1024jdk9 16 ss 16 65.682 ± 1.485 ms/opClassValueExpungeBench.redeployPartition 8 4096jdk9 16 ss 16 247.040 ± 7.684 ms/opClassValueExpungeBench.redeployPartition 64 1024jdk9 16 ss 16 302.536 ± 27.750 ms/opClassValueExpungeBench.redeployPartition 64 4096jdk9 16 ss 16 1174.002 ± 77.183 ms/op

Benchmark (classValuesPerPart) (classesPerPart) (impl) (partitions)Mode Cnt Score Error UnitsClassValueExpungeBench.redeployPartition 8 1024pl04.3 16 ss 16 47.179 ± 1.436 ms/opClassValueExpungeBench.redeployPartition 8 4096pl04.3 16 ss 16 163.067 ± 8.118 ms/opClassValueExpungeBench.redeployPartition 64 1024pl04.3 16 ss 16 67.581 ± 1.718 ms/opClassValueExpungeBench.redeployPartition 64 4096pl04.3 16 ss 16 240.458 ± 6.616 ms/op

A by-product of this simulation is a heap dump histogram taken after thelast warmup iteration when all "applications" are deployed:

top-10 classes heap dump for 64 ClassValue(s) x 4096 Class(es) x 16partitions


jdk9:

 num     #instances         #bytes  class name (module)
-------------------------------------------------------

1: 4194309 167772360 java.util.WeakHashMap$Entry(java.base@9-internal)2: 4195329 134250528 java.lang.ClassValue$Entry(java.base@9-internal)3: 65539 34603248 [Ljava.util.WeakHashMap$Entry;(java.base@9-internal)4: 65537 34603032 [Ljava.lang.ClassValue$Entry;(java.base@9-internal)

   5:         67301        7552448  java.lang.Class (java.base@9-internal)

6: 65536 4194304 java.lang.ClassValue$ClassValueMap(java.base@9-internal)7: 65552 2097664 java.lang.ref.ReferenceQueue(java.base@9-internal)8: 67736 1676344 [Ljava.lang.Object;(java.base@9-internal)

   9:         66349        1116848  [I (java.base@9-internal)

10: 65554 1048864 java.lang.ref.ReferenceQueue$Lock(java.base@9-internal)

-------------------------------------------------------
                         388915640 == 370 MiB

pl04.3:

 num     #instances         #bytes  class name (module)
-------------------------------------------------------

1: 133274 69833848 [Ljava.lang.Object;(java.base@9-internal)

   2:         67300        7552376  java.lang.Class (java.base@9-internal)

3: 65536 2097152 java.lang.ClassValue$ClassValueMap(java.base@9-internal)4: 65536 2097152java.lang.ClassValue$ClassValueMap$WeakNode (java.base@9-internal)

   5:         66349        1116848  [I (java.base@9-internal)
   6:         65991        1055856  java.lang.Object (java.base@9-internal)
   7:          7434         696584  [B (java.base@9-internal)

8: 1551 299680 [Ljava.lang.Class;(java.base@9-internal)9: 2447 176184 java.lang.reflect.Field(java.base@9-internal)

  10:          7154         171696  java.lang.String (java.base@9-internal)
-------------------------------------------------------
                          85097376 == 81 MiB

Footprint is reduced for more than 4 times. The main improvement of thisimplementation is that to support M Class(es) x N ClassValue(s), theonly part of the data structure that has O(M*N) space footprint are thebacking array(s) of LinearProbeHashtable. No less than 3 and no morethan 6 Object slots are used per Class x ClassValue association (4 inaverage). Other supporting parts of the data structure are either O(M),O(N) or O(1).


The lookup time has improved too with this patch (slightly):

http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueBench.java

http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3.bench_results.pdf

As to the LinearProbeHashtable, it has been updated from initial versionto implement the whole ConcurrentMap interface. Not much was needed forthat (~150 lines of heavy commented code) comparing to theimplementation in webrev.04.2(http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.2/).The end effect was that I was able to test the correctness of theimplementation using MOAT tests. Other changes to LinearProbeHashtablefrom webrev.04.2 (for those following the discussion on mlvm-dev) are:

- the lookup logic that was repeated in each and every method wasfactored-out into a lookup() private method.- further benchmarking revealed that is was not beneficial to havea 'skip' field in tombstone objects. Reading this field to optimizeskipping over a consecutive run of tombstones reduces cache-locality ofalgorithm. On average it is faster to just skip one tombstone at a timeand use a singleton tombstone object to be able to compare just thereference.- The following spin loop was added to get() method (and similar tosome other methods) to preserve SC properties of the methods:


    public V get(Object key) {
        Object[] tab = table;
        int i = lookup(tab, key);
        if (i >= 0) {
            Object value = getVolatile(tab, i + 1);
            if (value != null) {
                return (V) value;
            }
            // possible race with removing an entry can cause oldValue
            // to be observed cleared. If this happens, we know the key
            // has/will be tombstoned just after that but we must wait
            // for that to happen in order to preserve SC properties of
            // this method; without this wait one could even observe the
            // following in one thread:
            //
            //      table.get(k) == null && table.containsKey(k);
            //
            while (getVolatile(tab, i) != TOMBSTONE) {
                Thread.onSpinWait();
            }
        }
        return null;
    }

...the write this spin-loop is waiting on is in the remove() method(s):

                        tab[i + 1] = null; // clear the value...
                        tombstoned++;

putVolatile(tab, i, TOMBSTONE); // ...publishtombstone

The lookup performance of LinearProbeHashtable was put to test comparingit with ConcurrentHashMap in the following benchmark:


http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_src.tgz

Results:

http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_bench.pdf

In this benchmark, several variations are tested. The one fromwebrev.04.2 with stateful tombstones is called LinearProbeHashTable andthe one with singleton stateless tombstone, presented in this RFR aswebrev.04.3 is called LinearProbeHashTable1. The benchmark shows thateven during concurrent removals and insertions, the successful lookupperformance (GetExistent) does not degrade much and is on-par withConcurrentHashMap. This benchmark also explores the suitability of usingLinearProbeHashTable for keys derived from MethodType.

A note to reviewers: It is useless to compare the old implementationwith the new one side by side as they have not much in common (justpublic API and javadocs). Although the net line count increases withthis patch, the complexity of ClassValue implmenetation is reduced,because it is a layered implementation. ClassValue API is implemented interms of a well-known ConcurrentMap API and correctness of ConcurrentMapimplementation is verified by MOAT tests.


Regards, Peter

_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

RFR: JDK-8031043 ClassValue's backing map should have a smaller initial size

Reply via email to