wingo pushed a commit to branch wip-whippet in repository guile. commit 506b4187fcff9a3eda827a11f9706c2393627e5a Author: Andy Wingo <wi...@igalia.com> AuthorDate: Mon Sep 16 14:33:30 2024 +0200
Update manual --- README.md | 8 ++++--- doc/collector-bdw.md | 6 +++++ doc/manual.md | 67 ++++++++++++++++++++++++++++++++++++++++++++++------ 3 files changed, 71 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 52e98e77b..b3689fcd9 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,8 @@ See the [documentation](./doc/README.md). - Inline allocation / write barrier fast paths (supporting JIT) - One unified API with no-overhead abstraction: switch collectors when you like + - Three policies for sizing heaps: fixed, proportional to live size, and + [MemBalancer](http://marisa.moe/balancer.html) ## Source repository structure @@ -42,9 +44,9 @@ See the [documentation](./doc/README.md). ## Status and roadmap -As of September 2024, Whippet is almost feature-complete. The main -missing feature is dynamic heap growth and shrinkage -(https://github.com/wingo/whippet/issues/5), which should land soon. +As of September 2024, Whippet is almost feature-complete. We need to +land a per-object pinning API, and an API for cooperative safepoints for +use by threads that are looping without allocating. After that, the next phase on the roadmap is support for tracing, and some performance noodling. diff --git a/doc/collector-bdw.md b/doc/collector-bdw.md index b86a9d3b1..5a38b4e2e 100644 --- a/doc/collector-bdw.md +++ b/doc/collector-bdw.md @@ -15,6 +15,12 @@ finalizers), and both ephemerons and finalizers only approximate the Whippet behavior, because they are implemented in terms of what BDW-GC provides. +`bdw` supports the `fixed` and `growable` heap-sizing policies, but not +`adaptive`, as BDW-GC can't reliably return memory to the OS. Also, +[`growable` has an effective limit of a 3x heap +multiplier](https://github.com/wingo/whippet/blob/main/src/bdw.c#L478). +Oh well! + It's a bit of an oddball from a Whippet perspective, but useful as a migration path if you have an embedder that is already using BDW-GC. And, it is a useful performance comparison. diff --git a/doc/manual.md b/doc/manual.md index e88ac8198..45e8d019d 100644 --- a/doc/manual.md +++ b/doc/manual.md @@ -77,13 +77,15 @@ visitor function on all outgoing edges in an object. It also includes a of an object. `trace_edge` and `size` may be `NULL`, in which case no tracing or size computation should be performed. -### Tracing ephemerons +### Tracing ephemerons and finalizers Most kinds of GC-managed object are defined by the program, but the GC -itself has support for a specific object kind: ephemerons. If the -program allocates ephemerons, it should trace them in the -`gc_trace_object` function by calling `gc_trace_ephemeron` from -[`gc-ephemerons.h`](../api/gc-ephemerons.h). +itself has support for two specific object kind: ephemerons and +finalizers. If the program allocates ephemerons, it should trace them +in the `gc_trace_object` function by calling `gc_trace_ephemeron` from +[`gc-ephemerons.h`](../api/gc-ephemerons.h). Likewise if the program +allocates finalizers, it should trace them by calling +`gc_trace_finalizer` from [`gc-finalizer.h`](../api/gc-finalizer.h). ### Remembered-set bits @@ -299,6 +301,12 @@ We do this by including the `gc-embedder-api.h` implementation, via $(COMPILE) -include foo-embedder.h -o gc-ephemeron.o -c gc-ephemeron.c ``` +As for ephemerons, finalizers also have their own compilation unit. + +``` +$(COMPILE) -include foo-embedder.h -o gc-finalizer.o -c gc-finalizer.c +``` + #### Compile-time options There are a number of pre-processor definitions that can parameterize @@ -469,7 +477,7 @@ defined for all collectors: You can set these options via `gc_option_set_int` and so on; see [`gc-options.h`](../api/gc-options.h). Or, you can parse options from -trings: `heap-size-policy`, `heap-size`, `maximum-heap-size`, and so +strings: `heap-size-policy`, `heap-size`, `maximum-heap-size`, and so on. Use `gc_option_from_string` to determine if a string is really an option. Use `gc_option_parse_and_set` to parse a value for an option. Use `gc_options_parse_and_set_many` to parse a number of comma-delimited @@ -669,4 +677,49 @@ An ephemeron association can be removed via `gc_ephemeron_mark_dead`. ### Finalizers -Not yet implemented! +A finalizer allows the embedder to be notified when an object becomes +unreachable. + +A finalizer has a priority. When the heap is created, the embedder +should declare how many priorities there are. Lower-numbered priorities +take precedence; if an object has a priority-0 finalizer outstanding, +that will prevent any finalizer at level 1 (or 2, ...) from firing +until no priority-0 finalizer remains. + +Call `gc_attach_finalizer`, from `gc-finalizer.h`, to attach a finalizer +to an object. + +A finalizer also references an associated GC-managed closure object. +A finalizer's reference to the closure object is strong: if a +finalizer's closure closure references its finalizable object, +directly or indirectly, the finalizer will never fire. + +When an object with a finalizer becomes unreachable, it is added to a +queue. The embedder can call `gc_pop_finalizable` to get the next +finalizable object and its associated closure. At that point the +embedder can do anything with the object, including keeping it alive. +Ephemeron associations will still be present while the finalizable +object is live. Note however that any objects referenced by the +finalizable object may themselves be already finalized; finalizers are +enqueued for objects when they become unreachable, which can concern +whole subgraphs of objects at once. + +The usual way for an embedder to know when the queue of finalizable +object is non-empty is to call `gc_set_finalizer_callback` to +provide a function that will be invoked when there are pending +finalizers. + +Arranging to call `gc_pop_finalizable` and doing something with the +finalizable object and closure is the responsibility of the embedder. +The embedder's finalization action can end up invoking arbitrary code, +so unless the embedder imposes some kind of restriction on what +finalizers can do, generally speaking finalizers should be run in a +dedicated thread instead of recursively from within whatever mutator +thread caused GC. Setting up such a thread is the responsibility of the +mutator. `gc_pop_finalizable` is thread-safe, allowing multiple +finalization threads if that is appropriate. + +`gc_allocate_finalizer` returns a finalizer, which is a fresh GC-managed +heap object. The mutator should then directly attach it to an object +using `gc_finalizer_attach`. When the finalizer is fired, it becomes +available to the mutator via `gc_pop_finalizable`.