Re: Confusion regarding 'mark-sweep-compact' naming

Gil Tene Wed, 14 Aug 2019 02:59:55 -0700


Sent from my iPad

On Aug 13, 2019, at 11:44 PM, Peter Veentjer 
<[email protected]<mailto:[email protected]>> wrote:

Giving it a bit more thought:

the parallel and serial collectors are categorized as 'mark-sweep-compact' 
collectors.

https://plumbr.io/handbook/garbage-collection-algorithms/removing-unused-objects/compact
https://www.azul.com/files/Understanding_Java_Garbage_Collection_v4.pdf
https://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html

But in both cases, the young collector is a evacuating collector while the old 
collectors are mark-compact algorithms.

The old collector and the young collector are often completely different 
collectors, which can (and quite often do) use completely different collector 
algorithms. There is no point in trying to mix the separate collectors and 
collector algorithms and classify them into one category.

There is also no point in trying to rationalize how the OpenJDK newgen 
collectors (all of which are copying collectors) fit into the 
Mark/Sweep/Compact categorization. They don’t.

Copying collection is a completely different category. It is a firm of 
evacuating collection, and it does use a tracing pass. But it (generally) 
performs the entire collection in a single tracing pass (there is no separate 
mark, or sweep, and/or compact), and is linear only to live set, which makes it 
fundamentall efficient in very sparsely populated heaps, and an obvious fit for 
the youngest generation of a multi-generation collector (asthe generational 
hypothesis is still nearly-universally applicable, and sparseness is extremely 
common, likely, and easy to achieve).

It’s “downsides” include needing an empty to space that is as big as the from 
space to be be able run safely, since the from-space needs to be completely 
evacuated once evacuation starts, and cool single-pass efficient copy thing 
cannot know (before starting) how much of the from space is alive. This (2x 
heap size needed) makes copying collection a poor fit for the oldest generation 
of a collector (multi-generational or not).

Newgen collectors that are copying collectors overcome the downsides by being 
able to promote into oldgen as a bail-out mechanism (if the to-space is not 
large enough to hold the newgen liveset) . And as a result of this oldgen 
safety net, the to-space in newgen collectors is typically configured to be a 
small fraction of the worst-case they’d have to keep around if they had nowhere 
else to put stuff.

Note that there are variants of copying collectors that are “abortable” in the 
sense that the collection can safely stop without  completely evacuating the 
from space. But the variants I’ve seen that do that are less efficient ( I 
don’t think I‘ve seen a single pass variant that is abortable).

This brings up the question: how does sweep enter the picture?

CMS could be categorizes as mark-sweep-compact; even though it does a 
mark-sweep for the old generation; in case of a concurrent mode failure it will 
fall back on the parallel collector which does a mark-compact. So in that could 
I could understand why this one would be called mark-sweep-compact; but CMS is 
never called mark-sweep-compact.

On Wednesday, August 14, 2019 at 9:34:15 AM UTC+3, Peter Veentjer wrote:
Thanks for your answer Aleksey,

comments inline.

On Tuesday, August 6, 2019 at 12:41:28 PM UTC+3, Aleksey Shipilev wrote:
On 8/6/19 7:38 AM, Peter Veentjer wrote:
> There is still some confusion on my side. This time it is regarding the 
> algorithmic complexity of
> mark-sweep-compact.
>
> The complexity of the mark phase is linear to the live set.
> The complexity of the sweep phase is linear to the heap size.
> The complexity of the compact phase is linear to the live set.

Well...

The multiplicative factors in either case might make one linearity perform 
better than the other on
wildly different heap sizes and live sets. The adversarial example of this is N 
large byte arrays of
size M. Compact might have to copy N*M bytes (especially when doing sliding 
compaction), and sweep
might only need to visit N locations to check on object metadata.

There are second-order heap size effects for both mark and compact. Take two 
heaps with the same
live set, one very dense and one very sparse, and the mere fact you have to 
walk memory far away
adds up the secondary effect of heap size. It would not linger for compacting 
collectors, though, as
first few compaction cycles would collapse the heap to denser variant.

That is all to say that algorithmic complexity is a seriously bad way to reason 
about (GC) performance.

I understand. I created a bunch of flashcard to get a better conceptual 
understanding of the different gc algorithms;
I'll add some additional flashcard to refine my understanding.

> Why would you ever want to include the a sweep phase because the complexity 
> of mark-sweep-compact is
> now linear to the heap size; a lot worse compared to 'linear to the live 
> set'. Get rid of the sweep
> and complexity goes down to 'linear to the live set'. What is the added value 
> of such an inefficient
> GC implementation?

I think m-s-c nomenclature is confusing, as the actual implementations are 
doing either m-s or m-c
during their individual phases.

Exactly. And this was the source of my confusion and caused by an abuse of 
names.

The benefit of doing either is well covered.

The hard naming question is how to properly label the collector where different 
phases employ
different algos. For example, in CMS the young collection is copying, old 
collection is mark-sweep,
fall-back full GC is mark-compact. How would you label CMS then?

It would be easier/simpler to label the young and old collector independently; 
but I understand your point.

I believe marking it m-s-c is
borderline tolerable, until users get confused thinking both sweep and compact 
happen in the same phase.

Also with implementations that manage backing storage in fine-grained chunks 
(for example, free
lists), "sweep" might mean dealing with them on per-dead-object basis. For 
example, this might mean
 preparing the storage for subsequent compaction by "sweeping" out old objects 
before compaction
kicks in, which requires free space. Practical implementations I know opt to 
instead reconstruct the
backing storage metadata after massive compaction moves, though.

My $0.02c, anyway.

-Aleksey

--
You received this message because you are subscribed to a topic in the Google 
Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/mechanical-sympathy/Kk0rQij7q8I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/c4c052fc-86da-435e-a8eb-8bba5031041d%40googlegroups.com<https://groups.google.com/d/msgid/mechanical-sympathy/c4c052fc-86da-435e-a8eb-8bba5031041d%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/EF07956F-7DB6-4BD3-AE38-18AD66EAEDFB%40azul.com.

Re: Confusion regarding 'mark-sweep-compact' naming

Reply via email to