Re: Confusion regarding 'mark-sweep-compact' naming

Gil Tene Sun, 18 Aug 2019 12:35:46 -0700


On Friday, August 16, 2019 at 1:30:10 AM UTC-7, Aleksey Shipilev wrote:
>
> On 8/16/19 10:07 AM, Gil Tene wrote: 
> > Classification terms evolve as the art evolves. 
>
> What I want to emphasize here that this discussion reinvents the meaning 
> of "sweep". That definition 
> is not used in the way you describe in the sources I know of. Granted, 
> definitions drift over time, 
> but we have to be careful to separate what is the "agreed on" definition, 
> and what is whatever 
> definition we want to be the emergent one. 
>
> > On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev 
> wrote: 
> >     I am trying to understand what is your central argument here. This 
> seems to be it. Are you saying 
> >     that "sweeping" is when you visit dead objects, and non-"sweeping" 
> is when you do not? 
> > 
> > Yes. I think that's a very good summary. 
>
> Would be helpful if we started from this next time around! 
>


True. Maybe this discussion will help others start from there next time. 
It's certainly helped me hone
on that specific summary.
 

>
> >     Is walking _dead_ objects the discriminator for "sweeping" then? So 
> in your book, if we take the 
> >     same Lisp2 collector, and compute-new-addresses and adjust-pointers 
> steps are walking the 
> >     self-parsable heap (visiting dead objects along the way), it is 
> M-S-C? But if it uses marking 
> >     bitmap 
> >     (thus only visiting live objects), it becomes M-C? [This would be a 
> weird flex, but okay]. 
> > 
> > 
> > Exactly. "In my book" adding an efficient livemark bit vector with 
> possible summary layers would 
> > covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact 
> (with no sweep). 
>
> So this is what irks me here. In my Epsilon's mark-compact toy collector 
> (https://shipilev.net/jvm/diy-gc/), changing from mark bitmaps to 
> self-parsable heap traversal is 
> about 5 lines of code. This does not feel like passing the bar for 
> reclassification of the collector 
> algo between major classes.


Self-parseable heaps are expensive to traverse, and that expense grows with 
the size non-live
part of the heap. Being sensitive to live-occupancy % (or not) is a 
fundamental quality in
a collector.

If a "5 lines of code" change allows your collector to no longer be 
sensitive to heap size, and
stay sensitive only to the live set, it is well worth it.  And it also 
changes its classification.

E.g. sometimes changing from a linked list to an array list touches only 
one line of code, but
that change can dramatically alter the classification from a complexity and 
fit-for-use point
of view.
 

> It feels more like the implementation detail that can swing both ways, 
> depending on circumstances. 
>

ParalellGC probably wouldn't call the file that implements it's
Mark-Compact (with sweep) algorithm "MarkSweep.cpp", if the authors
of that code thought of their implementation as a Mark Compact that
does not include a sweep, or if they thouight of the sweeping as a
small implementation detail.

If it were just an implementation detail, there wouldn't be scores of 
academic
papers that seems to classify the "Mark Compact" (the historic kind that 
visits all
dead objects, so implies sweeping) as inherently less CPU-efficient, and 
back
that category-wide conclusion with analyses and measurements that all use
sweeping in their Mark-Compact implementations.

And there wouldn't be similar analyses that seem to classify all 
non-Mark-Compact
moving collectors (putting them into the only remaining categories of 
semi-space or 
regional evacuating) as less memory-efficient based on their assumed need
for space outside of the "in place" compaction that canonical Mark-Compact
(with implied sweeping) is known to be able to do.

These classifications (historically assuming that Mark-Compact includes
visiting all dead objects, and/or that anything else can't do in-place 
compaction)
both lead to historical conclusions that no longer apply to more modern
implementations that still clearly belong in the Mark-Compact category.

Here are two classifications of implementation-orthogonal characteristics
of any moving collector, and are fundamental to analyzing fit for use in
various common scenarios:

1. Do you visit all objects, or just the live ones?
[which I call "do you sweep?]

2. Do you need extra space in order to compact, or can you
perform the compaction within the memory previously occupied
by dead objects?
[which I don't have a great name for, but it is demonstrably not 
 "do you do in place compaction"?]

E.g. for an algorithm used in a young generation collector, which 
fundamentally 
hopes to leverage the observed quality that the generation's live set is
much smaller than the generation's heap size, the first quality is 
critical. 

And e.g. for an algorithm used for the oldest generation, and is looking to
be able to handle large live occupancy % (e.g. in the 40%-90%), the second
quality matters quite a bit, as without it either heap occupancy is either
capped (as is the case in many semi-space collectors), or the CPU-efficiency
drops dramatically (much faster than the linear relationship to the 
heap:live ratio)
in the cases of compactors that can stop compacting mid-cycle because 
they've
run out of room to complete.


> -Aleksey 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/10fdbd5a-f02d-4a7e-b269-3858b2826f1a%40googlegroups.com.

Re: Confusion regarding 'mark-sweep-compact' naming

Reply via email to