Including build-dev since this patch is adding new issue 8248135:


https://bugs.openjdk.java.net/browse/JDK-8248135


So here's new webrev with a patch for building benchmarks with --enable-preview included:


http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.08/


Regards, Peter


On 6/23/20 10:23 AM, Claes Redestad wrote:


On 2020-06-23 10:06, Claes Redestad wrote:
Hi,

On 2020-06-23 09:49, Peter Levart wrote:
Hi Chris, Claes,


Ok then, here's with benchmark included:


http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.07/


I haven't been able to run the benchmark with "make test" though. I have tried various ways to pass javac options to build like:


make test TEST='micro:org.openjdk.bench.java.io.RecordDeserialization' TEST_OPTS="VM_OPTIONS=--enable-preview --release=16"


...but javac doesn't seem to get them. Is there some secret option to achieve that?

Hmm, we might as well have the microbenchmarks build with
--enable-preview on by default. Try this:

Fixed:

diff -r f2e1cd498381 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:08:35 2020 +0200
+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:33:17 2020 +0200
@@ -90,10 +90,11 @@
     TARGET_RELEASE := $(TARGET_RELEASE_NEWJDK_UPGRADED), \
     SMALL_JAVA := false, \
     CLASSPATH := $(MICROBENCHMARK_CLASSPATH), \
-    DISABLED_WARNINGS := processing rawtypes cast serial, \
+    DISABLED_WARNINGS := processing rawtypes cast serial preview, \
     SRC := $(MICROBENCHMARK_SRC), \
     BIN := $(MICROBENCHMARK_CLASSES), \
     JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules java.management, \
+    JAVAC_FLAGS := --enable-preview, \
 ))

 $(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)

I verified this works with your micro, and doesn't seem to cause
any issues elsewhere:

 make test TEST=micro:RecordDeserialization

I can shepherd this as a separate fix for documentation purposes, but
feel free to include it in your patch and ping build-dev@..

/Claes


diff -r 52741f85bf23 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk    Tue Jun 23 09:54:42 2020 +0200 +++ b/make/test/BuildMicrobenchmark.gmk    Tue Jun 23 09:59:29 2020 +0200
@@ -93,7 +93,7 @@
      DISABLED_WARNINGS := processing rawtypes cast serial, \
      SRC := $(MICROBENCHMARK_SRC), \
      BIN := $(MICROBENCHMARK_CLASSES), \
-    JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules java.management, \ +    JAVA_FLAGS := --enable-preview --add-modules jdk.unsupported --limit-modules java.management, \
  ))

  $(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)



Otherwise, I simulated what would happen when there are more then 10 ObjectStreamClass layouts for same class rapidly interchanging, so that they push each other out of the cache, by temporarily setting cache's MAX_SIZE = 0. Note that this is worst case scenario:


Benchmark                                      (length)  Mode Cnt    Score    Error  Units RecordDeserializationBench.deserializeClasses        10 avgt   10    9.393 ±  0.287  us/op RecordDeserializationBench.deserializeClasses       100 avgt   10   35.642 ±  0.977  us/op RecordDeserializationBench.deserializeClasses      1000 avgt   10  293.769 ±  7.321  us/op RecordDeserializationBench.deserializeRecords        10 avgt   10   15.335 ±  0.496  us/op RecordDeserializationBench.deserializeRecords       100 avgt   10  211.427 ± 11.908  us/op RecordDeserializationBench.deserializeRecords      1000 avgt   10  990.398 ± 26.681  us/op


This is using JMH option '-gc true' to force GC after each iteration of benchmark. Without it, I get a big (~4s) full-GC pause just in the middle of a run with length=100:


Iteration   1: 528.577 us/op
Iteration   2: 580.404 us/op
Iteration   3: 4438.228 us/op
Iteration   4: 644.532 us/op
Iteration   5: 698.493 us/op
Iteration   6: 800.738 us/op
Iteration   7: 929.791 us/op
Iteration   8: 870.946 us/op
Iteration   9: 863.416 us/op
Iteration  10: 916.508 us/op


...so results are a bit off because of that:


Benchmark                                      (length)  Mode Cnt     Score      Error  Units RecordDeserializationBench.deserializeClasses        10 avgt   10     8.263 ±    0.043  us/op RecordDeserializationBench.deserializeClasses       100 avgt   10    33.406 ±    0.160  us/op RecordDeserializationBench.deserializeClasses      1000 avgt   10   287.595 ±    0.960  us/op RecordDeserializationBench.deserializeRecords        10 avgt   10    15.270 ±    0.080  us/op RecordDeserializationBench.deserializeRecords       100 avgt   10  1127.163 ± 1771.892  us/op RecordDeserializationBench.deserializeRecords      1000 avgt   10  2003.235 ±  227.159  us/op


Yes, there is quite a bit of GCing going on when cache is thrashing. Note that I haven't tuned GC in any way and I'm running this on a machine with 64GiB of RAM so heap is allowed to grow quite big and G1 is used by default I think.


This is still no worse than before the patch:


Benchmark                                 (length)  Mode Cnt     Score    Error  Units RecordDeserialization.deserializeClasses        10  avgt 10     8.382 :  0.013  us/op RecordDeserialization.deserializeClasses       100  avgt 10    33.736 :  0.171  us/op RecordDeserialization.deserializeClasses      1000  avgt 10   271.224 :  0.953  us/op RecordDeserialization.deserializeRecords        10  avgt 10    58.606 :  0.446  us/op RecordDeserialization.deserializeRecords       100  avgt 10   530.044 :  1.752  us/op RecordDeserialization.deserializeRecords      1000  avgt   10 5335.624 : 44.942  us/op


... since caching of adapted method handle for multiple objects withing single stream is still effective.

I doubt there will ever be more than 10 variants/versions of the same record class deserialized by the same VM and in rapid succession, so I think this should not be an issue. We could add a system property to control the MAX_SIZE of cache if you think it is needed.

Thanks for running the numbers on this, and I agree - it seems
outlandishly improbable (most will only see one, and if you have to
maintain 10+ different serialized shapes of some record you likely have bigger problems).

I'd say let's keep it constant unless someone actually asks for it.

/Claes



Regards, Peter


On 6/22/20 1:04 AM, Claes Redestad wrote:
Hi Peter,

patch and results look great!

My only real comment on this is that I think the microbenchmark would be
a valuable contribution, too.

It'd also be interesting to explore how poor performance would become if
we'd hit the (artificial) 11 layouts limit, e.g, by cycling through
10, 11, or 12 different shapes.

/Claes

On 2020-06-21 19:16, Peter Levart wrote:
Hi,


When re-running the benchmark [1] with different lengths of serialized arrays of records, I found that, compared to classical classes, lookup into the cache of adapted method handles starts to show when the length of array is larger (# of instances of same record type deserialized in single stream). Each record deserialized must lookup the method handle in a ConcurrentHashMap:


Benchmark                                    (length) Mode Cnt Score   Error  Units RecordSerializationBench.deserializeClasses        10 avgt 10 8.088 ± 0.013  us/op RecordSerializationBench.deserializeClasses       100 avgt 10 32.171 ± 0.324  us/op RecordSerializationBench.deserializeClasses      1000 avgt 10 279.762 ± 3.072  us/op RecordSerializationBench.deserializeRecords        10 avgt 10 9.011 ± 0.027  us/op RecordSerializationBench.deserializeRecords       100 avgt 10 33.206 ± 0.514  us/op RecordSerializationBench.deserializeRecords      1000 avgt 10 325.137 ± 0.969  us/op


...so keeping the correctly shaped adapted method handle in the per-serialization-session ObjectStreamClass instance [2] starts to make sense:


Benchmark                                    (length) Mode Cnt Score   Error  Units RecordSerializationBench.deserializeClasses        10 avgt 10 8.681 ± 0.155  us/op RecordSerializationBench.deserializeClasses       100 avgt 10 32.496 ± 0.087  us/op RecordSerializationBench.deserializeClasses      1000 avgt 10 279.014 ± 1.189  us/op RecordSerializationBench.deserializeRecords        10 avgt 10 8.537 ± 0.032  us/op RecordSerializationBench.deserializeRecords       100 avgt 10 31.451 ± 0.083  us/op RecordSerializationBench.deserializeRecords      1000 avgt 10 250.854 ± 2.772  us/op


With that, more objects means advantage over classical classes instead of disadvantage.


[1] http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/RecordSerializationBench.java

[2] http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.06/


Regards, Peter


Reply via email to