You make a good point about historical terminology classifying "mark/sweep" and "mark/compact". But that terminology predates, and could not have anticipated, the multitude of (sometimes fairly dramatic) variants within the universe of compacting collectors that came in the decades that followed.
I'll remind folks that back when those terms were used to mean those specific things, the term "parallel" was also used to describe what we've evolved to calling "concurrent" collectors for the past 25 years or so. Classification terms evolve as the art evolves. On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev wrote: > > On 8/14/19 9:21 PM, Gil Tene wrote: > > The fact that they visit and parse dead objects in order to identify > recycle-able memory is what makes them > > sweepers. Other techniques that don't visit dead objects do not perform > a sweep. > > I am trying to understand what is your central argument here. This seems > to be it. Are you saying > that "sweeping" is when you visit dead objects, and non-"sweeping" is when > you do not? > Yes. I think that's a very good summary. To draw a specific line: If you end up visiting each individual dead object, you are sweeping. If you only visit each "hole" once (which may comprise of 1 or more dead objects, and covers everything between one line object and the next), the number of visits you make will be no larger than the number of live objects, and you would not be sweeping. > > >> I would leave "sweep" as something that reclaims the storage of the > objects (i.e. "sweeps away" the > >> garbage), which makes M-C and M-S classes more clearly defined and > easier to reason about. > > > > When M-C implies "also does a full linear traversal that visits all dead > objects" (aka "a sweep"), that> terminology fails. Examples like C4, G1GC, > Shenandoah, and ZGC are all Mark-Compact (NO Sweep) > > collectors, while ParallelGC and Serial GC are Mark-Compact (WITH Sweep) > collectors. Since the math > > behind those varies dramatically in wasy that effect everyday choices, > using "M-C" as a classification > > that covers both leads people to misunderstand the interactions between > e.g. heap size and GC work. > > M-C does not imply "full linear traversal that visits all dead objects". We agree on that. I am saying that M-C (with sweep) does imply that, and M-C (with no sweep) does not. And I am pointing out that the difference between the two is profound enough that use of "M-C" with no qualification makes the following misquotation of Stanislaw Ulam a good fit: "Describing all non-copying relocating collectors as 'Mark Compact' is like referring to the bulk of zoology as the study of non-elephant animals." > As the counter-example, it is very practical to use off-side marking > bitmaps to walk only live objects of the heap. Careful design of such the marking bitmap would yield > dramatically better results than using whatever self-parsable-heap walk. You can even make it > multi-level and quite dense to skip over large chunks of sparse heap. Yes. both straight livemark bit vectors and livemark bit vectors with summaries allow you to avoid visiting individual dead objects, by effciency jumping form one live object to another without ever visiting or parsing a dead object. And neither one would be "sweeping". > When the marking phase is a separate phase, it stands to reason that the > subsequent phases would > have to process the marking results somehow. That would naturally do some > sort of walk over the data > structure that handles marks. If we call that walk "sweeping", then every > Mark-* algorithm is > necessarily "sweeping" too. So we have to break this somehow to make the > definition viable. > Sweeping in not the processing of marking results, or the visiting or processing of the live objects. It is the visiting and processing of dead objects. > > Is walking _dead_ objects the discriminator for "sweeping" then? So in > your book, if we take the > same Lisp2 collector, and compute-new-addresses and adjust-pointers steps > are walking the > self-parsable heap (visiting dead objects along the way), it is M-S-C? But > if it uses marking bitmap > (thus only visiting live objects), it becomes M-C? [This would be a weird > flex, but okay]. > Exactly. "In my book" adding an efficient livemark bit vector with possible summary layers would covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact (with no sweep). ParallelGC doesn't do that, and sticks with the classic Mark-Sweep-Compact. But if it did add that mechanism, it would significantly improve in efficiency for low-occupancy oldgens, and significantly reduce it's sensitivity to heap size (for a fixed live set). The reason I am keenly aware of the difference is that we use a concurrent Mark-Compact (no sweep) collector not only for the oldgen in C4, but also for the young generation. Since the young generation tends to be very sparse, the difference between having a sweep or not is very profound, and efficient hopping from one live-mark to the next is key. > > -Aleksey > > > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/53552321-db05-4ec8-9681-f297ce4e75b4%40googlegroups.com.
