Re: Confusion regarding 'mark-sweep-compact' naming

Gil Tene Fri, 16 Aug 2019 01:08:19 -0700

You make a good point about historical terminology classifying "mark/sweep" 
and
"mark/compact". But that terminology predates, and could not have 
anticipated, the
multitude of (sometimes fairly dramatic) variants within the universe of 
compacting
collectors that came in the decades that followed.

I'll remind folks that back when those terms were used to mean those 
specific
things, the term "parallel" was also used to describe what we've evolved to
calling "concurrent" collectors for the past 25 years or so.

Classification terms evolve as the art evolves.

On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev wrote:
>
> On 8/14/19 9:21 PM, Gil Tene wrote: 
> > The fact that they visit and parse dead objects in order to identify 
> recycle-able memory is what makes them 
> > sweepers. Other techniques that don't visit dead objects do not perform 
> a sweep. 
>
> I am trying to understand what is your central argument here. This seems 
> to be it. Are you saying 
> that "sweeping" is when you visit dead objects, and non-"sweeping" is when 
> you do not? 
>

Yes. I think that's a very good summary.

To draw a specific line: If you end up visiting each individual dead 
object, you are sweeping.
If you only visit each "hole" once (which may comprise of 1 or more dead 
objects, and covers
everything between one line object and the next), the number of visits you 
make will be no
larger than the number of live objects, and you would not be sweeping.

>
> >> I would leave "sweep" as something that reclaims the storage of the 
> objects (i.e. "sweeps away" the 
> >> garbage), which makes M-C and M-S classes more clearly defined and 
> easier to reason about. 
> > 
> > When M-C implies "also does a full linear traversal that visits all dead 
> objects" (aka "a sweep"), that> terminology fails. Examples like C4, G1GC, 
> Shenandoah, and ZGC are all Mark-Compact (NO Sweep) 
> > collectors, while ParallelGC and Serial GC are Mark-Compact (WITH Sweep) 
> collectors. Since the math 
> > behind those varies dramatically in wasy that effect everyday choices, 
> using "M-C" as a classification 
> > that covers both leads people to misunderstand the interactions between 
> e.g. heap size and GC work. 
>
> M-C does not imply "full linear traversal that visits all dead objects". 

We agree on that.

I am saying that M-C (with sweep) does imply that, and M-C (with no sweep) 
does not.

And I am pointing out that the difference between the two is profound 
enough that
use of "M-C" with no qualification makes the following misquotation of 
Stanislaw Ulam
a good fit:

"Describing all non-copying relocating collectors as 'Mark Compact' is like 
referring to the
bulk of zoology as the study of non-elephant animals."

> As the counter-example, it is very practical to use off-side marking 
> bitmaps to walk only live 

objects of the heap. Careful design of such the marking bitmap would yield 
> dramatically better 

results than using whatever self-parsable-heap walk. You can even make it 
> multi-level and 

quite dense to skip over large chunks of sparse heap. 

Yes. both straight livemark bit vectors and livemark bit vectors with 
summaries allow you to
avoid visiting individual dead objects, by effciency jumping form one live 
object to another
without ever visiting or parsing a dead object. And neither one would be 
"sweeping".

> When the marking phase is a separate phase, it stands to reason that the 
> subsequent phases would 
> have to process the marking results somehow. That would naturally do some 
> sort of walk over the data 
> structure that handles marks. If we call that walk "sweeping", then every 
> Mark-* algorithm is 
> necessarily "sweeping" too. So we have to break this somehow to make the 
> definition viable. 
>

Sweeping in not the processing of marking results, or the visiting or 
processing of the live objects.
It is the visiting and processing of dead objects.

>
> Is walking _dead_ objects the discriminator for "sweeping" then? So in 
> your book, if we take the 
> same Lisp2 collector, and compute-new-addresses and adjust-pointers steps 
> are walking the 
> self-parsable heap (visiting dead objects along the way), it is M-S-C? But 
> if it uses marking bitmap 
> (thus only visiting live objects), it becomes M-C? [This would be a weird 
> flex, but okay]. 
>

Exactly. "In my book" adding an efficient livemark bit vector with possible 
summary layers would
covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact 
(with no sweep).

ParallelGC doesn't do that, and sticks with the classic Mark-Sweep-Compact. 
But if it did
add that mechanism, it would significantly improve in efficiency for 
low-occupancy oldgens,
and significantly reduce it's sensitivity to heap size (for a fixed live 
set).

The reason I am keenly aware of the difference is that we use a concurrent 
Mark-Compact
(no sweep) collector not only for the oldgen in C4, but also for the young 
generation.
Since the young generation tends to be very sparse, the difference between 
having a sweep 
or not is very profound, and efficient hopping from one live-mark to the 
next is key.

>
> -Aleksey 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/53552321-db05-4ec8-9681-f297ce4e75b4%40googlegroups.com.

Re: Confusion regarding 'mark-sweep-compact' naming

Reply via email to