Re: Benchmark scenario with high G1 performance degradation

Stefan Johansson Mon, 24 Apr 2017 07:17:46 -0700

Hi Jens,

On 2017-04-21 13:03, Jens Wilke wrote:

Hi Stefan,


On Donnerstag, 20. April 2017 19:59:30 ICT Stefan Johansson wrote:

Thanks for reaching out and for providing such a good step-by-step guide
on how to run the benchmark the same way you are.

Thanks for the quick reply!

I've tried with CMS, G1 and Parallel, both with 10g and 20g heap, but so
far I can't reproduce your problems. It would be great if you could
provide us with some more information. For example GC-logs and the
result files. We might be able to dig something out of them.

The logs from the measurement on my notebook for the first mail (see below) are 
available at (only 30 days valid):

http://ovh.to/FzKbgrb

What environment you are testing on?

I only did some quick testing on my desktop with has 12 cores andhyper-threading, so the default is to use 18 parallel GC threads on mysystem.


Please mind the core count. My stomach tells me that it could have something to 
do with the hash table arrays. When you are testing with a system that reports 
more than 8 cores the allocated arrays will be smaller than in my case, since 
the cache is doing segmentation.

  From my runs it looks like G1 is about 5-10% behind CMS and 10-15%
behind Parallel for both JDK 8 and 9.

That seems okay.

Actually, I'd like to publish my next benchmark results, however, I am somehow 
stuck with this issue now. Benchmarking with CMS only doesn't really make sense 
at the current point in time. Also I don't like to be in doubt that there is 
something wrong in the setup.

I took a quick look at the blog as well, and there the system had 32GB
ram and the runs were done with -Xmx10g. The system you describe here
only have 20GB ram and you are using -Xmx20G, is that correct or has
there been a typo?

My bad, sorry for the confusion. There was enough free memory and RSS was only 
at 6GB so the system was not swapping. I did play with the parameters to see 
whether it makes a difference, but forgot to put it in a reasonable range when 
sending the report.

The effects on the isolated benchmark system with 32GB and-Xmx10g or  -Xmx20G 
are the same (see blog article for parameters).

The hopping point seems to be the function OtherRegionsTable::add_reference.
When I run with -prof perfasm and Java 8U121 with and without G1 on the 
benchmark system I get this:

.../jdk1.8.0_121/bin/java -jar jmh-suite/target/benchmarks.jar 
\\.RandomSequenceBenchmark -jvmArgs -server\ -Xmx20G\ 
-XX:BiasedLockingStartupDelay=0\ -verbose:gc\ -XX:+PrintGCDetails -f 1 -wi 1 -w 
20s  -i 1 -r 20s -t 4 -prof org.cache2k.benchmark.jmh.LinuxVmProfiler -prof 
org.cache2k.benchmark.jmh.MiscResultRecorderProfiler  -p 
cacheFactory=org.cache2k.benchmark.Cache2kFactory -rf json -rff result.json 
-prof perfasm

....[Hottest Methods (after 
inlining)]..............................................................
  22.48%    6.61%         C2, level 4  
org.cache2k.core.AbstractEviction::removeAllFromReplacementListOnEvict, version 
897
  21.06%    8.18%         C2, level 4  
org.cache2k.core.HeapCache::insertNewEntry, version 913
   9.38%    7.13%           libjvm.so  SpinPause
   9.13%    9.54%         C2, level 4  
org.cache2k.benchmark.jmh.suite.eviction.symmetrical.generated.RandomSequenceBenchmark_operation_jmhTest::operation_thrpt_jmhStub,
 version 873
   5.00%    3.88%           libjvm.so  
_ZN13InstanceKlass17oop_push_contentsEP18PSPromotionManagerP7oopDesc
   4.54%    4.70%       perf-5104.map  [unknown]
   3.57%    3.86%         C2, level 4  
org.cache2k.core.AbstractEviction::removeFromHashWithoutListener, version 838
   2.86%   15.40%           libjvm.so  
_ZN13ObjectMonitor11NotRunnableEP6ThreadS1_
   2.48%   12.53%           libjvm.so  
_ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread
   2.46%    1.72%         C2, level 4  
org.cache2k.core.AbstractEviction::refillChunk, version 906
   2.31%    3.33%           libjvm.so  
_ZN18PSPromotionManager22copy_to_survivor_spaceILb0EEEP7oopDescS2_
   2.24%    6.37%         C2, level 4  
java.util.concurrent.locks.StampedLock::acquireRead, version 864
   2.03%    2.89%           libjvm.so  
_ZN18PSPromotionManager18drain_stacks_depthEb
   1.44%    1.43%           libjvm.so  
_ZN13ObjArrayKlass17oop_push_contentsEP18PSPromotionManagerP7oopDesc
   1.29%    0.72%              kernel  [unknown]
   1.03%    1.27%           libjvm.so  
_ZN18CardTableExtension26scavenge_contents_parallelEP16ObjectStartArrayP12MutableSpaceP8HeapWordP18PSPromotionManagerjj
   0.79%    1.53%         C2, level 4  
java.util.concurrent.locks.StampedLock::acquireWrite, version 865
   0.74%    4.21%        runtime stub  StubRoutines::SafeFetch32
   0.71%    0.50%         C2, level 4  
org.cache2k.core.ClockProPlusEviction::sumUpListHits, version 772
   0.70%    0.39%        libc-2.19.so  __clock_gettime
   3.76%    3.73%  <...other 147 warm methods...>
....................................................................................................
100.00%   99.93%  <totals>

.../jdk1.8.0_121/bin/java -jar jmh-suite/target/benchmarks.jar 
\\.RandomSequenceBenchmark -jvmArgs -server\ -Xmx20G\ 
-XX:BiasedLockingStartupDelay=0\ -verbose:gc\ -XX:+PrintGCDetails\ -XX:+UseG1GC 
-f 1 -wi 1 -w 20s  -i 1 -r 20s -t 4 -prof 
org.cache2k.benchmark.jmh.LinuxVmProfiler -prof 
org.cache2k.benchmark.jmh.MiscResultRecorderProfiler  -p 
cacheFactory=org.cache2k.benchmark.Cache2kFactory -rf json -rff result.json 
-prof perfasm

....[Hottest Methods (after 
inlining)]..............................................................
  49.11%   41.16%           libjvm.so  _ZN17OtherRegionsTable13add_referenceEPvi
  10.25%    3.37%         C2, level 4  
org.cache2k.core.ClockProPlusEviction::removeFromReplacementListOnEvict, 
version 883
   4.93%    1.43%         C2, level 4  
org.cache2k.core.SegmentedEviction::submitWithoutEviction, version 694
   4.31%    5.89%           libjvm.so  
_ZN29G1UpdateRSOrPushRefOopClosure6do_oopEPj
   3.18%    4.17%           libjvm.so  
_ZN13ObjArrayKlass20oop_oop_iterate_nv_mEP7oopDescP24FilterOutOfRegionClosure9MemRegion
   3.17%    3.00%           libjvm.so  
_ZN29G1BlockOffsetArrayContigSpace18block_start_unsafeEPKv
   2.95%    3.16%       perf-5226.map  [unknown]
   2.19%    1.00%         C2, level 4  
org.cache2k.benchmark.Cache2kFactory$1::getIfPresent, version 892
   1.58%    1.50%           libjvm.so  _ZN8G1RemSet11refine_cardEPajb
   1.42%    5.02%           libjvm.so  
_ZNK10HeapRegion12block_is_objEPK8HeapWord
   1.41%    3.31%           libjvm.so  
_ZN10HeapRegion32oops_on_card_seq_iterate_carefulE9MemRegionP24FilterOutOfRegionClosurebPa
   1.13%    3.05%           libjvm.so  
_ZN13InstanceKlass18oop_oop_iterate_nvEP7oopDescP24FilterOutOfRegionClosure
   0.98%    0.51%           libjvm.so  _ZN14G1HotCardCache6insertEPa
   0.89%    4.27%           libjvm.so  
_ZN13ObjectMonitor11NotRunnableEP6ThreadS1_
   0.85%    1.17%         C2, level 4  
org.cache2k.core.HeapCache::insertNewEntry, version 899
   0.74%    3.59%           libjvm.so  
_ZN13ObjectMonitor20TrySpin_VaryDurationEP6Thread
   0.74%    0.57%           libjvm.so  _ZN20G1ParScanThreadState10trim_queueEv
   0.70%    0.70%         C2, level 4  org.cache2k.core.Hash2::remove, version 
864
   0.69%    0.81%         C2, level 4  
org.cache2k.core.ClockProPlusEviction::findEvictionCandidate, version 906
   0.65%    1.59%         C2, level 4  
org.cache2k.benchmark.jmh.suite.eviction.symmetrical.generated.RandomSequenceBenchmark_operation_jmhTest::operation_thrpt_jmhStub,
 version 857
   8.14%   10.65%  <...other 331 warm methods...>
....................................................................................................
100.00%   99.91%  <totals>

As I mentioned in my reply to your other mail, these calls are caused byregion to region pointers in G1. Adding those references can be doneeither during a safepoint or concurrently. Looking at your profile itseems that most calls come from the concurrent path and since yoursystem has few cores having the concurrent refinement threads doing alot of work will have impact on the over all performance more.


Stefan

Best,

Jens


_______________________________________________
hotspot-gc-use mailing list
[email protected]
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

Re: Benchmark scenario with high G1 performance degradation

Reply via email to