Hi Jens,

On 2017-04-25 05:25, Jens Wilke wrote:
Hi Stefan,

On Montag, 24. April 2017 21:16:20 ICT Stefan Johansson wrote:
I've tried with CMS, G1 and Parallel, both with 10g and 20g heap, but so
far I can't reproduce your problems. It would be great if you could
provide us with some more information. For example GC-logs and the
result files. We might be able to dig something out of them.
The logs from the measurement on my notebook for the first mail (see
below) are available at (only 30 days valid):

http://ovh.to/FzKbgrb

What environment you are testing on?
I only did some quick testing on my desktop with has 12 cores and
hyper-threading, so the default is to use 18 parallel GC threads on my
system.
The benchmarks I am conducting are using four workload threads on four CPU
cores. The example I sent is with four workload threads, so in your
environment you have enough spare cores for GC work and you don't see the
performance difference to the CMS collector.

The benchmark is designed to have a constrained core count and keep those
cores maximal busy.
I see, under those circumstances G1 will have a harder time keeping up than the other collectors due to concurrent refinement. You might be able to tune your way out of this or at least improve the situation, but I'm not sure that is what your looking for.
As I mentioned in my reply to your other mail, these calls are caused by
region to region pointers in G1. Adding those references can be done
either during a safepoint or concurrently. Looking at your profile it
seems that most calls come from the concurrent path and since your
system has few cores having the concurrent refinement threads doing a
lot of work will have impact on the over all performance more.
Yes.

I have the feeling that there is some kind of "tripping point" in the whole
system, that causes the high "refinement" activity which would be interesting
to understand.

For the moment I postpone to dig into this deeper. It's "just" a benchmark
scenario which triggers this effect. I believe that interactive applications
that would make use of G1 and its low pause times don't have these large cache
sizes.
I agree that this is not a benchmark or scenario where we expect G1 to be the best choice. The notion I get is that this is a very throughput oriented benchmark, and especially when run in a constrained environment this will be though on G1. Still, as you said, it would be interesting to understand at which point things start to go bad and work to improve on that.
Using JMH to get some reliable benchmark results for scenarios with large
heaps need some more work, too. AFIAK I am the only one doing "not so micro"
benchmarks with JMH.

Thanks for looking into this!
Thanks again for sharing you findings and if you have more interesting benchmarks/results to share, please do so.

Stefan


Best,

Jens


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

Reply via email to