Hi Jens,
On 2017-04-21 13:03, Jens Wilke wrote:
Hi Stefan,
On Donnerstag, 20. April 2017 19:59:30 ICT Stefan Johansson wrote:
Thanks for reaching out and for providing such a good step-by-step guide
on how to run the benchmark the same way you are.
Thanks for the quick reply!
I've tried with CMS, G1 and Parallel, both with 10g and 20g heap, but so
far I can't reproduce your problems. It would be great if you could
provide us with some more information. For example GC-logs and the
result files. We might be able to dig something out of them.
The logs from the measurement on my notebook for the first mail (see below) are
available at (only 30 days valid):
http://ovh.to/FzKbgrb
What environment you are testing on?
I only did some quick testing on my desktop with has 12 cores and
hyper-threading, so the default is to use 18 parallel GC threads on my
system.
Please mind the core count. My stomach tells me that it could have something to
do with the hash table arrays. When you are testing with a system that reports
more than 8 cores the allocated arrays will be smaller than in my case, since
the cache is doing segmentation.
From my runs it looks like G1 is about 5-10% behind CMS and 10-15%
behind Parallel for both JDK 8 and 9.
That seems okay.
Actually, I'd like to publish my next benchmark results, however, I am somehow
stuck with this issue now. Benchmarking with CMS only doesn't really make sense
at the current point in time. Also I don't like to be in doubt that there is
something wrong in the setup.
I took a quick look at the blog as well, and there the system had 32GB
ram and the runs were done with -Xmx10g. The system you describe here
only have 20GB ram and you are using -Xmx20G, is that correct or has
there been a typo?
My bad, sorry for the confusion. There was enough free memory and RSS was only
at 6GB so the system was not swapping. I did play with the parameters to see
whether it makes a difference, but forgot to put it in a reasonable range when
sending the report.
The effects on the isolated benchmark system with 32GB and-Xmx10g or -Xmx20G
are the same (see blog article for parameters).
The hopping point seems to be the function OtherRegionsTable::add_reference.
When I run with -prof perfasm and Java 8U121 with and without G1 on the
benchmark system I get this:
.../jdk1.8.0_121/bin/java -jar jmh-suite/target/benchmarks.jar
\\.RandomSequenceBenchmark -jvmArgs -server\ -Xmx20G\
-XX:BiasedLockingStartupDelay=0\ -verbose:gc\ -XX:+PrintGCDetails -f 1 -wi 1 -w
20s -i 1 -r 20s -t 4 -prof org.cache2k.benchmark.jmh.LinuxVmProfiler -prof
org.cache2k.benchmark.jmh.MiscResultRecorderProfiler -p
cacheFactory=org.cache2k.benchmark.Cache2kFactory -rf json -rff result.json
-prof perfasm
....[Hottest Methods (after
inlining)]..............................................................
22.48% 6.61% C2, level 4
org.cache2k.core.AbstractEviction::removeAllFromReplacementListOnEvict, version
897
21.06% 8.18% C2, level 4
org.cache2k.core.HeapCache::insertNewEntry, version 913
9.38% 7.13% libjvm.so SpinPause
9.13% 9.54% C2, level 4
org.cache2k.benchmark.jmh.suite.eviction.symmetrical.generated.RandomSequenceBenchmark_operation_jmhTest::operation_thrpt_jmhStub,
version 873
5.00% 3.88% libjvm.so
_ZN13InstanceKlass17oop_push_contentsEP18PSPromotionManagerP7oopDesc
4.54% 4.70% perf-5104.map [unknown]
3.57% 3.86% C2, level 4
org.cache2k.core.AbstractEviction::removeFromHashWithoutListener, version 838
2.86% 15.40% libjvm.so
_ZN13ObjectMonitor11NotRunnableEP6ThreadS1_
2.48% 12.53% libjvm.so
_ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread
2.46% 1.72% C2, level 4
org.cache2k.core.AbstractEviction::refillChunk, version 906
2.31% 3.33% libjvm.so
_ZN18PSPromotionManager22copy_to_survivor_spaceILb0EEEP7oopDescS2_
2.24% 6.37% C2, level 4
java.util.concurrent.locks.StampedLock::acquireRead, version 864
2.03% 2.89% libjvm.so
_ZN18PSPromotionManager18drain_stacks_depthEb
1.44% 1.43% libjvm.so
_ZN13ObjArrayKlass17oop_push_contentsEP18PSPromotionManagerP7oopDesc
1.29% 0.72% kernel [unknown]
1.03% 1.27% libjvm.so
_ZN18CardTableExtension26scavenge_contents_parallelEP16ObjectStartArrayP12MutableSpaceP8HeapWordP18PSPromotionManagerjj
0.79% 1.53% C2, level 4
java.util.concurrent.locks.StampedLock::acquireWrite, version 865
0.74% 4.21% runtime stub StubRoutines::SafeFetch32
0.71% 0.50% C2, level 4
org.cache2k.core.ClockProPlusEviction::sumUpListHits, version 772
0.70% 0.39% libc-2.19.so __clock_gettime
3.76% 3.73% <...other 147 warm methods...>
....................................................................................................
100.00% 99.93% <totals>
.../jdk1.8.0_121/bin/java -jar jmh-suite/target/benchmarks.jar
\\.RandomSequenceBenchmark -jvmArgs -server\ -Xmx20G\
-XX:BiasedLockingStartupDelay=0\ -verbose:gc\ -XX:+PrintGCDetails\ -XX:+UseG1GC
-f 1 -wi 1 -w 20s -i 1 -r 20s -t 4 -prof
org.cache2k.benchmark.jmh.LinuxVmProfiler -prof
org.cache2k.benchmark.jmh.MiscResultRecorderProfiler -p
cacheFactory=org.cache2k.benchmark.Cache2kFactory -rf json -rff result.json
-prof perfasm
....[Hottest Methods (after
inlining)]..............................................................
49.11% 41.16% libjvm.so _ZN17OtherRegionsTable13add_referenceEPvi
10.25% 3.37% C2, level 4
org.cache2k.core.ClockProPlusEviction::removeFromReplacementListOnEvict,
version 883
4.93% 1.43% C2, level 4
org.cache2k.core.SegmentedEviction::submitWithoutEviction, version 694
4.31% 5.89% libjvm.so
_ZN29G1UpdateRSOrPushRefOopClosure6do_oopEPj
3.18% 4.17% libjvm.so
_ZN13ObjArrayKlass20oop_oop_iterate_nv_mEP7oopDescP24FilterOutOfRegionClosure9MemRegion
3.17% 3.00% libjvm.so
_ZN29G1BlockOffsetArrayContigSpace18block_start_unsafeEPKv
2.95% 3.16% perf-5226.map [unknown]
2.19% 1.00% C2, level 4
org.cache2k.benchmark.Cache2kFactory$1::getIfPresent, version 892
1.58% 1.50% libjvm.so _ZN8G1RemSet11refine_cardEPajb
1.42% 5.02% libjvm.so
_ZNK10HeapRegion12block_is_objEPK8HeapWord
1.41% 3.31% libjvm.so
_ZN10HeapRegion32oops_on_card_seq_iterate_carefulE9MemRegionP24FilterOutOfRegionClosurebPa
1.13% 3.05% libjvm.so
_ZN13InstanceKlass18oop_oop_iterate_nvEP7oopDescP24FilterOutOfRegionClosure
0.98% 0.51% libjvm.so _ZN14G1HotCardCache6insertEPa
0.89% 4.27% libjvm.so
_ZN13ObjectMonitor11NotRunnableEP6ThreadS1_
0.85% 1.17% C2, level 4
org.cache2k.core.HeapCache::insertNewEntry, version 899
0.74% 3.59% libjvm.so
_ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread
0.74% 0.57% libjvm.so _ZN20G1ParScanThreadState10trim_queueEv
0.70% 0.70% C2, level 4 org.cache2k.core.Hash2::remove, version
864
0.69% 0.81% C2, level 4
org.cache2k.core.ClockProPlusEviction::findEvictionCandidate, version 906
0.65% 1.59% C2, level 4
org.cache2k.benchmark.jmh.suite.eviction.symmetrical.generated.RandomSequenceBenchmark_operation_jmhTest::operation_thrpt_jmhStub,
version 857
8.14% 10.65% <...other 331 warm methods...>
....................................................................................................
100.00% 99.91% <totals>
As I mentioned in my reply to your other mail, these calls are caused by
region to region pointers in G1. Adding those references can be done
either during a safepoint or concurrently. Looking at your profile it
seems that most calls come from the concurrent path and since your
system has few cores having the concurrent refinement threads doing a
lot of work will have impact on the over all performance more.
Stefan
Best,
Jens
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use