So David seems to be very interested in building libraries that don't GC.
Let's first try to get the requirement stated more precisely, and then
let's see if we have a solution for it.

I think it's fair to say that we want the *semantics* of object death to be
the same whether object reclamation occurs eagerly or lazily. For example,
I don't think we want distinct kinds of finalizer logic for eagerly and
lazily collected objects.

I also think it's fair to say that we're not so much concerned with GC as
we are concerned with knowing an upper bound on GC latency and ensuring
that bound is acceptable for our application. In an application where
objects may have finalizers, there is no way to drive the bound to zero,
because object reclamation (either eager or lazy) requires the execution of
user-defined code. So really, we're concerned with the latency to (a)
discover the live set, and (b) discover that subset of the deceased object
set that requires finalization.

Regions *almost* solve this. We can certainly ensure that all of the
allocations of interest occur within a known region, and we can very easily
be explicit about when that region is reclaimed. There are two cases that
will not address that owned pointers *can* address:

*Reduction of Liveness*. If we have an existing non-borrowed object
reference, and that object reference is overwritten, then the live set for
the target region has potentially gotten smaller. Until this occurs, no
profit can be had from GC. Note that in region-based allocation, liveness
does not shrink on a per-object basis when the stack shrinks - that
operation causes liveness to shrink on a per-region basis, which doesn't
require a trace operation prior to reclamation. This is something we can
check easily, and infer in most cases.

*Survival of a Small Subset* Just as Rust wants to hand object graphs
across Task boundaries, a region-based system can end up with region bloat
by virtue of object return. The pattern is that we build a bunch of things
on the way to returning just a few of them, but the fact that *some* are
returned causes the region constraints to propagate to everything we built,
with the result that the intermediate garbage cannot be collected eagerly
on return.

The only answer to this in a region-based system is to deepcopy the
subgraph being returned - which is not necessarily a bad thing to do. The
reason you have to copy it is driven at the type level by breaking the link
between regions, but also at the implementation level: you want region
reclaim to happen in bulk at the level of the allocator, which means that
objects in different regions should not be commingled in a single memory
"chunk".

ARC isn't really a good answer. The problem with ARC in this situation is
that it leads to heap fragmentation in the same way that malloc/free do.
The net effect of this is that an object allocated through ARC has to use a
completely different *kind* of allocator, which is significantly more
expensive than a GC-style allocator. In a surprising number of cases
(though not in all), deepcopy-on-return will be faster than ARC, and has
the added advantage that it can handle cycles. Note that the deepcopy can
be intelligent, in that we know the purpose of deepcopy is for
region-driven object survival, and we don't need to copy *or trace* objects
that are already in the desired output region.


I said in another note that Rust picked the wrong pointer default, and that
the sensible behavior should be "infer the regions you can, and GC
everything else so that pointers appear to have the expected general
behavior". My rationale for this is that you want to keep things simple for
beginners, and escalate conceptual complexity only where it has payoff.
Designing *good* libraries is surprisingly hard to do; it's not an activity
done by everyday programmers. Better to defer confusing complications until
they are motivated. Even when an advanced programmer is building a library,
it may make sense to defer dealing with allocation strategy until the
algorithms are clearly understood.

So for the kinds of libraries David is concerned about, my proposal would
be to first build the library without concern for GC, and then annotate all
of the procedure types at the library interface with something like a *
NoGarbage* effect. A procedure satisfies "NoGarbage" if (a) the only live
object at the point of return is the one returned by the procedure, and (b)
no tracing is performed or induced by the execution of the procedure. If
you want this to be a property of the entire library, I don't see any
reason not to stick effects on interfaces, with the intended meaning that
they apply to all functions in the interface.

A procedure is "memoryless" if (a) it is a NoGarbage procedure, and (b) the
region of the return result is the caller region, or a region explicitly
supplied by the caller.

Now when you add this effect constraint on the interface, what happens is
that the compiler now has to propagate the effect and make sure all of your
code satisfies the effect. And of course some of it won't, and now you
start getting messages about all sorts of places where the compiler's
attempt to do smart region inference didn't get you what you wanted. And
those you go fix.


In any case, I think that the compiler can successfully infer a lot of
cases where region-based reclaim is possible and sensible, and it can
identify and provide useful diagnostics in the cases where it is not.



shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to