wingo pushed a commit to branch wip-whippet
in repository guile.

commit 7b60164cacbde69e3420d31508d06660503a6e2f
Author: Andy Wingo <wi...@igalia.com>
AuthorDate: Mon Mar 7 11:31:28 2022 +0100

    Update README
---
 README.md | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/README.md b/README.md
index 518736410..3323d1377 100644
--- a/README.md
+++ b/README.md
@@ -3,6 +3,80 @@
 This repository is a workbench for implementing different GCs.  It's a
 scratch space.
 
+## What's there
+
+There's just the (modified) GCBench, which is an old but standard
+benchmark that allocates different sizes of binary trees.  It takes a
+heap of 25 MB or so, not very large, and causes somewhere between 20 and
+50 collections, running in 100 to 500 milliseconds on 2022 machines.
+
+Then there are currently three collectors:
+
+ - `bdw.h`: The external BDW-GC conservative parallel stop-the-world
+   mark-sweep segregated-fits collector with lazy sweeping.
+ - `semi.h`: Semispace copying collector.
+ - `mark-sweep.h`: Stop-the-world mark-sweep segregated-fits collector
+   with lazy sweeping.
+
+The two latter collectors reserve one word per object on the header,
+which makes them collect more frequently than `bdw` because the `Node`
+data type takes 32 bytes instead of 24 bytes.
+
+These collectors are sketches and exercises for improving Guile's
+garbage collector.  Guile currently uses BDW-GC.  In Guile if we have an
+object reference we generally have to be able to know what kind of
+object it is, because there are few global invariants enforced by
+typing.  Therefore it is reasonable to consider allowing the GC and the
+application to share the first word of an object, for example to store a
+mark bit, to allow the application to know what kind an object is, to
+allow the GC to find references within the object, to allow the GC to
+compute the object's size, and so on.
+
+The GCBench benchmark is small but then again many Guile processes also
+are quite short-lived, so perhaps it is useful to ensure that small
+heaps remain lightweight.
+
+Guile has a widely used C API and implements part of its run-time in C.
+For this reason it may be infeasible to require precise enumeration of
+GC roots -- we may need to allow GC roots to be conservatively
+identified from data sections and from stacks.  Such conservative roots
+would be pinned, but other objects can be moved by the collector if it
+chooses to do so.  We assume that object references within a heap object
+can be precisely identified.  (The current BDW-GC scans for references
+conservatively even on the heap.)
+
+A likely good solution for Guile would be an [Immix
+collector](https://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf)
+with conservative roots, and a parallel stop-the-world mark/evacuate
+phase.  We would probably follow the [Rust
+implementation](http://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-2016.pdf),
+more or less, with support for per-line pinning.  In an ideal world we
+would work out some kind of generational solution as well, either via a
+semispace nursery or via sticky mark bits, but this requires Guile to
+use a write barrier -- something that's possible to do within Guile
+itself but it's unclear if we can extend this obligation to users of
+Guile's C API.
+
+In any case, these experiments also have the goal of identifying a
+smallish GC abstraction in Guile, so that we might consider evolving GC
+implementation in the future without too much pain.  If we switch away
+from BDW-GC, we should be able to evaluate that it's a win for a large
+majority of use cases.
+
+## To do
+
+ - [ ] Implement a parallel marker for the mark-sweep collector.
+ - [ ] Adapt GCBench for multiple mutator threads.
+ - [ ] Implement precise non-moving Immix whole-heap collector.
+ - [ ] Add evacuation to Immix whole-heap collector.
+ - [ ] Add parallelism to Immix stop-the-world phase.
+ - [ ] Implement conservative root-finding for the mark-sweep collector.
+ - [ ] Implement conservative root-finding and pinning for Immix.
+ - [ ] Implement generational GC with semispace nursery and mark-sweep
+   old generation.
+ - [ ] Implement generational GC with semispace nursery and Immix
+   old generation.
+
 ## License
 
 GCBench.c, MT_GCBench.c, and MT_GCBench2.c are from

Reply via email to