[bitc-dev] Regions and GC

Jonathan S. Shapiro Sat, 27 Jul 2013 13:31:22 -0700

Though I introduced *GCAlloc* and *DynamicRegionAlloc* as possible effects
of interest, these are ill-conceived. *GCAlloc* is a special case of *
DynamicRegionAlloc*, because in a region based system, the GC heap is a
region.


Let me start this by identifying a problem with Rust's owned pointers: they
can't be used to build *acyclic* heap structures. The reason is that owned
pointers are required to be unique. That, in turn, means that you cannot do
something of the form:

   let o1 : ~Object1 = new Object1();
       o2 : ~Object2 = new Object2();
       oContaining : ~Object3(o1, o2) in
    ...

the closest you can get to that is:

   let o1 : ~Object1 = new Object1()
       o2 : ~Object2 = new Object2() in
     let oContaining : ~Object3(&o1, &o2) in
       ...

that is: you are forced into an on-stack lexical structure when what you
really wanted was to express recursively unique ownership. Linear types
combined with liveness analysis can do this, but they are hard to use.
First-class regions can do this, and are relatively easy to use. Adapting
David Gay's syntax (from his dissertation):

   let region %r
       o1 = new (%r) Object1()
       o2 = new (%r) Object2()
       oContaining = new (%r) Object3(o1, o2) in
     ...

where the unit of release becomes the *region* rather than the object. It
even supports cyclic structures, and if explicit regions are supported by
the language you can effectively prevent extraneous objects from getting
tangled up in this region. In abstract, no GC is required for release,
except to the extent that you end up with the same finalization logic that
is traditionally associated with GC systems. It is possible - though not
always advisable - to run finalizers promptly on region release.

Note that in:

   let region %r
       o1 = new (%r) Object1()
       o2 = new (%r) Object2()
       oContaining = new Object3(o1, o2) in // unconstrained, fails
     return oContaining

The return of oContaining means that it's region is at least as big as the
caller region. Since references o1 and o2 have lesser lifespans, the
construction of Object3 fails.

Note that regions strictly subsumes "owned" pointers. Rust-style ownership,
which might better be described as "defined release point pointers" than
"unique pointers", can be *approximated* by:

   let region %r1
       o1 = new (%r1) Object1()
       region %r2
       o2 = new (%r2) Object2()
       container = Object3(&o1, &o2) // yes, address-of
       oContaining = &container in   // just to show we can get a reference
     ...

Or  if you want the *exact* semantics of Rust unique pointers, why not just
stack allocate, because *every statically sized owned object can be stack
allocated!*:

   let o1 = Object1()
       o2 = Object2()
       region %r3
       oContaining = new (%r3) Object3(o1, o2) in
     ...

Of course, we don't even need to write that much, because the region
ordering constraint on oContaining would be inferred if we simply wrote:

   let o1 = Object1()
       o2 = Object2()
       oContaining = new (%r3) Object3(o1, o2) in
     ...

Speaking of which, a proper polymorphic region system subsumes borrowed
pointers as well. So, now, can somebody explain to me again what the unique
pointer concept is actually good for?

Which brings me to one of the pitfalls of first-class regions...


The appeal of first-class regions is that you can gather a bunch of objects
together and know that they die as a group. So far so good. The *
disadvantage* of first-class regions is that you have an explicit new()
operator. This can lead to unbounded allocation within the region (e.g.
inside a *for* loop), and in turn can induce a requirement for
region-by-region GC. You can actually get the same effect with Rust's owned
pointers, because the owned pointer only gets you eager deallocation of the
top-level object in a heap structure. In fact, if I understand the
semantics of owned pointers in Rust correctly, *every owned allocation
should have been stack allocated* (unless the object had dynamic size). But
if you wanted stack allocation, why didn't you just *do* that, and let the
region system ensure that you could safely take the address of the thing
you put on the stack without the pointer escaping? *That* is the
requirement that "borrowed" pointers don't solve - and the reason we were
moving to a real, honest-to-god region system in BitC.

Now for the good news:

   1. The compiler can usually determine whether a region may have
   decreasing liveness over its lifespan.
   2. Region GC is a *much* more bounded problem than general GC, and it
   notably *doesn't* require a mark pass on the heap.
   3. Unless a region is explicitly shared across threads, region GC for
   any region excluding the global region[s] is a thread-local collection.

Which brings me to the grand punch line of this note. Why do I say "global
region[s]" instead of "global region".

Regions have lifetime relationships. The general (GCd) heap is generally
viewed as the maximal region, in the sense that the life span of objects in
that region is [conservatively] pessimal. It's not that hard to define *
other* global regions that are *non* pessimal. By definition, there can be
no reference from the GC heap pointing into any lesser global region. And
in many cases we can show that the lesser global region never needs to be
GCd. Not only that, we can establish things like per-thread heaps that can
be eagerly collected when a service thread goes away.

Food for thought, eh Ian?

So now dragging this all the way back to the *GCAlloc* and *
DynamicRegionAlloc* effects, I think we would be better served with two
different effects that replace these:
*
*

   - *Reduces(%r)*, with the intended meaning that the code takes some step
   that potentially reduces the liveness of region %r, and region %r survives
   the return of the procedure. That is: it's a possible consequence of
   executing this procedure that new garbage exists after it returns a GC may
   discover newly reclaimable storage in some surviving region that is not
   older than %r
   - *MayGC* with the intended meaning that the internal behavior of the
   called procedure may cause it to induce a GC on it's own privately created
   regions. That is: on regions that do *not* survive the exit of the
   procedure.

The first of these is a statement about the surviving heap after the
procedure returns. The second is a statement about the execution time
variance of the called procedure. It may turn out that the second one
should be subsumed under some other effect that deals more generally with
intuitions about time-bounded behavior.


shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

[bitc-dev] Regions and GC

Reply via email to