Hi Joakim,

On 19.01.20 12:02, Joakim Thun wrote:
Hi all,

I would really appreciate some help understanding a G1 behaviour I am seeing when decreasing the value of G1RSetUpdatingPauseTimePercent where the goal is to decrease the time spent in the UpdateRS phase by moving some of the work to be processed concurrently by the refinement threads.

The behaviour I was expecting to see was a decrease in UpdateRS time which I am seeing but at the expense of more time being spent in the ScanRS phase so the end result i.e. the total pause time end up being very similar with and without the flag set. Decreasing G1RSetUpdatingPauseTimePercent to both 5 and 1 results in similar behaviour. I noticed that the number of scanned cards is much higher in the ScanRS phase when decreasing G1RSetUpdatingPauseTimePercent.

Is this expected behaviour?


TLDR: yes.

Longer version:

The refinement threads and the refinement queues (which are processed during Update RS) purpose is to update the remembered sets (attributed in the Scan RS time) after some filtering (is that card already in a remembered set? Can we drop it for other reasons?)

If an entry/card in the refinement queues has not been processed before GC, it must be during GC (not the entire filtering needs to be applied there).

What is cheaper to do during GC, scanning remembered sets or refinement queues? Depends on the contents of the card. If it contains references to a lot of regions in the collection set, then it is probably cheaper to let it stay in the refinement queue. If it does not contain a reference to any region in the collection set, then putting it into the remembered sets it's a win because we moved otherwise unnecessary work out of the pause.

There are a lot of different arguments about what the optimal location for a card should be; some of these decisions have impact outside of the gc pause too. E.g. a card in the refinement queue not yet processed is never re-enqueued - this saves enqueuing and processing work at mutator time; however, given that they may not contain cards that are in the collection set (which you know if you process them), keeping them would make pause slightly time longer. As long as the card in the refinement buffer contains a reference to the collection set, G1 would scan it anyway (it would be in some remembered set), and retrieving values from the refinement queue during gc is (very slightly) faster than from the remembered sets.

Overall there is no rule that "Update RS" work is bad while "Scan RS" isn't.

In your case, since you are trading Update RS with Scan RS time, I would argue that it's better to have the cards in the refinement queue.

Are there any other flags worth considering to improve the ScanRS time while moving more work to the refinement threads?

One could try to manually control refinement work by manually setting the various thresholds. No guarantees that this improves your situation.

Logging "gc+ergo+refine=debug" may help with debugging the adaptive refinement thresholds; gc+remset=trace gives some general information about concurrent refinement.

Some rundown on the options:

G1UseAdaptiveConcRefinement: enable adaptive refinement, ie. try to observe G1UpdatePauseTimePercent.

G1UpdateBufferSize (default 256): size of a buffer in the refinement queue, i.e. individual threads will cache that amount of cards to process later until they are made available to the refinement threads.

G1ConcRefinementGreenZone, G1ConcRefinementYellowZone, G1ConcRefinementRedZone: some thresholds that control refinement threads. If the number of buffers (see above) is lower than the green threshold, there is no concurrent refinement activity. From green to yellow threshold increasingly more concurrent refinement threads will be used. If the threshold reaches red, mutator threads will do the work.

If G1UseAdaptiveConcRefinement is enabled, the thresholds are changed adaptively, and the ones you give on the command line are initial values. Otherwise the thresholds are fixed.

G1ConcGCThreads: max number of refinement threads.

So you could completely disable concurrent refinement by disabling G1UseAdaptiveConcRefinement, and setting G1ConcGCThreads=0; this will make the mutators do all the work immediately if you set the red threshold to 0 too. If you set the G1UpdateBufferSize to 1 too, the mutators will immediately do all work I think (this will likely have a significant impact on mutator performance).

Otherwise, using the thresholds, you can, in a very granular way select the amount of concurrent refinement work.

Thanks,
  Thomas
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use@openjdk.java.net
https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

Reply via email to