Re: Increased ScanRS time when decreasing G1RSetUpdatingPauseTimePercent

Thomas Schatzl Mon, 20 Jan 2020 06:18:47 -0800

Hi Joakim,

On 19.01.20 12:02, Joakim Thun wrote:

Hi all,
I would really appreciate some help understanding a G1 behaviour I amseeing when decreasing the value of G1RSetUpdatingPauseTimePercent wherethe goal is to decrease the time spent in the UpdateRS phase by movingsome of the work to be processed concurrently by the refinement threads.
The behaviour I was expecting to see was a decrease in UpdateRS timewhich I am seeing but at the expense of more time being spent in theScanRS phase so the end result i.e. the total pause time end up beingvery similar with and without the flag set. DecreasingG1RSetUpdatingPauseTimePercent to both 5 and 1 results in similarbehaviour. I noticed that the number of scanned cards is much higher inthe ScanRS phase when decreasing G1RSetUpdatingPauseTimePercent.
Is this expected behaviour?


TLDR: yes.

Longer version:

The refinement threads and the refinement queues (which are processedduring Update RS) purpose is to update the remembered sets (attributedin the Scan RS time) after some filtering (is that card already in aremembered set? Can we drop it for other reasons?)

If an entry/card in the refinement queues has not been processed beforeGC, it must be during GC (not the entire filtering needs to be appliedthere).

What is cheaper to do during GC, scanning remembered sets or refinementqueues? Depends on the contents of the card. If it contains referencesto a lot of regions in the collection set, then it is probably cheaperto let it stay in the refinement queue. If it does not contain areference to any region in the collection set, then putting it into theremembered sets it's a win because we moved otherwise unnecessary workout of the pause.

There are a lot of different arguments about what the optimal locationfor a card should be; some of these decisions have impact outside of thegc pause too.E.g. a card in the refinement queue not yet processed is neverre-enqueued - this saves enqueuing and processing work at mutator time;however, given that they may not contain cards that are in thecollection set (which you know if you process them), keeping them wouldmake pause slightly time longer.As long as the card in the refinement buffer contains a reference to thecollection set, G1 would scan it anyway (it would be in some rememberedset), and retrieving values from the refinement queue during gc is (veryslightly) faster than from the remembered sets.


Overall there is no rule that "Update RS" work is bad while "Scan RS" isn't.

In your case, since you are trading Update RS with Scan RS time, I wouldargue that it's better to have the cards in the refinement queue.

Are there any other flags worth considering to improve the ScanRS timewhile moving more work to the refinement threads?

One could try to manually control refinement work by manually settingthe various thresholds. No guarantees that this improves your situation.

Logging "gc+ergo+refine=debug" may help with debugging the adaptiverefinement thresholds; gc+remset=trace gives some general informationabout concurrent refinement.


Some rundown on the options:

G1UseAdaptiveConcRefinement: enable adaptive refinement, ie. try toobserve G1UpdatePauseTimePercent.

G1UpdateBufferSize (default 256): size of a buffer in the refinementqueue, i.e. individual threads will cache that amount of cards toprocess later until they are made available to the refinement threads.

G1ConcRefinementGreenZone, G1ConcRefinementYellowZone,G1ConcRefinementRedZone: some thresholds that control refinementthreads. If the number of buffers (see above) is lower than the greenthreshold, there is no concurrent refinement activity. From green toyellow threshold increasingly more concurrent refinement threads will beused. If the threshold reaches red, mutator threads will do the work.

If G1UseAdaptiveConcRefinement is enabled, the thresholds are changedadaptively, and the ones you give on the command line are initialvalues. Otherwise the thresholds are fixed.


G1ConcGCThreads: max number of refinement threads.

So you could completely disable concurrent refinement by disablingG1UseAdaptiveConcRefinement, and setting G1ConcGCThreads=0; this willmake the mutators do all the work immediately if you set the redthreshold to 0 too. If you set the G1UpdateBufferSize to 1 too, themutators will immediately do all work I think (this will likely have asignificant impact on mutator performance).

Otherwise, using the thresholds, you can, in a very granular way selectthe amount of concurrent refinement work.


Thanks,
  Thomas
_______________________________________________
hotspot-gc-use mailing list
[email protected]
https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

Re: Increased ScanRS time when decreasing G1RSetUpdatingPauseTimePercent

Reply via email to