So David seems to be very interested in building libraries that don't GC. Let's first try to get the requirement stated more precisely, and then let's see if we have a solution for it.
I think it's fair to say that we want the *semantics* of object death to be the same whether object reclamation occurs eagerly or lazily. For example, I don't think we want distinct kinds of finalizer logic for eagerly and lazily collected objects. I also think it's fair to say that we're not so much concerned with GC as we are concerned with knowing an upper bound on GC latency and ensuring that bound is acceptable for our application. In an application where objects may have finalizers, there is no way to drive the bound to zero, because object reclamation (either eager or lazy) requires the execution of user-defined code. So really, we're concerned with the latency to (a) discover the live set, and (b) discover that subset of the deceased object set that requires finalization. Regions *almost* solve this. We can certainly ensure that all of the allocations of interest occur within a known region, and we can very easily be explicit about when that region is reclaimed. There are two cases that will not address that owned pointers *can* address: *Reduction of Liveness*. If we have an existing non-borrowed object reference, and that object reference is overwritten, then the live set for the target region has potentially gotten smaller. Until this occurs, no profit can be had from GC. Note that in region-based allocation, liveness does not shrink on a per-object basis when the stack shrinks - that operation causes liveness to shrink on a per-region basis, which doesn't require a trace operation prior to reclamation. This is something we can check easily, and infer in most cases. *Survival of a Small Subset* Just as Rust wants to hand object graphs across Task boundaries, a region-based system can end up with region bloat by virtue of object return. The pattern is that we build a bunch of things on the way to returning just a few of them, but the fact that *some* are returned causes the region constraints to propagate to everything we built, with the result that the intermediate garbage cannot be collected eagerly on return. The only answer to this in a region-based system is to deepcopy the subgraph being returned - which is not necessarily a bad thing to do. The reason you have to copy it is driven at the type level by breaking the link between regions, but also at the implementation level: you want region reclaim to happen in bulk at the level of the allocator, which means that objects in different regions should not be commingled in a single memory "chunk". ARC isn't really a good answer. The problem with ARC in this situation is that it leads to heap fragmentation in the same way that malloc/free do. The net effect of this is that an object allocated through ARC has to use a completely different *kind* of allocator, which is significantly more expensive than a GC-style allocator. In a surprising number of cases (though not in all), deepcopy-on-return will be faster than ARC, and has the added advantage that it can handle cycles. Note that the deepcopy can be intelligent, in that we know the purpose of deepcopy is for region-driven object survival, and we don't need to copy *or trace* objects that are already in the desired output region. I said in another note that Rust picked the wrong pointer default, and that the sensible behavior should be "infer the regions you can, and GC everything else so that pointers appear to have the expected general behavior". My rationale for this is that you want to keep things simple for beginners, and escalate conceptual complexity only where it has payoff. Designing *good* libraries is surprisingly hard to do; it's not an activity done by everyday programmers. Better to defer confusing complications until they are motivated. Even when an advanced programmer is building a library, it may make sense to defer dealing with allocation strategy until the algorithms are clearly understood. So for the kinds of libraries David is concerned about, my proposal would be to first build the library without concern for GC, and then annotate all of the procedure types at the library interface with something like a * NoGarbage* effect. A procedure satisfies "NoGarbage" if (a) the only live object at the point of return is the one returned by the procedure, and (b) no tracing is performed or induced by the execution of the procedure. If you want this to be a property of the entire library, I don't see any reason not to stick effects on interfaces, with the intended meaning that they apply to all functions in the interface. A procedure is "memoryless" if (a) it is a NoGarbage procedure, and (b) the region of the return result is the caller region, or a region explicitly supplied by the caller. Now when you add this effect constraint on the interface, what happens is that the compiler now has to propagate the effect and make sure all of your code satisfies the effect. And of course some of it won't, and now you start getting messages about all sorts of places where the compiler's attempt to do smart region inference didn't get you what you wanted. And those you go fix. In any case, I think that the compiler can successfully infer a lot of cases where region-based reclaim is possible and sensible, and it can identify and provide useful diagnostics in the cases where it is not. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
