On Fri, Oct 11, 2013 at 9:41 PM, Bennie Kloosteman <[email protected]>wrote:

> I havent read the papers ( I will ) but my issue for reference counting
> has always been concurency ..  eg
>
> AtomicIncrement(&refPtr)
>
> if (AtomicDecrement(&refPtr) - == 0 )
>     delete
>
> problem is the atomics become LOCK CMPXCHG
>

INCL/DECL as you said later. But I don't think so.

So first, let me say that I'm not proposing to rely on any *semantics* of
reference counting. The model is still that reclamation *conceptually* occurs
out of band, and may be delayed. Even with reference counting, it *will* be
delayed, because the fast reference counting implementations all rely on
deferred updates. In any case, I still want to admit implementations built
on conventional GC - reference counting has its own forms of overhead.

My first reaction to the paper was much the same as yours: concurrency
issues are going to be a pain here. You're thinking about x86, but I'm
actually more worried about ARM, where we have weak memory ordering to cope
with. But I think that (a) you are looking at the wrong question, and (b)
there are *many* factors that offset your concerns. Both of those things
can be true, and you may still turn out to be right. I just want to lay out
here why there may be some grounds for hope.

The reason I say you are asking the wrong question is that this is a
qualitatively different kind of contention than reader/writer contention.
When an object is accessed concurrently, the majority of pointer operations
will be copying pointers from that object into temporaries. The
overwhelming majority of those temporaries will be init-only references,
which don't require counts. Borrowing more generally is an optimization
that hasn't been attempted by the RC-immix work. If those aren't enough,
the worst outcome is that hot shared objects will very quickly reach their *
stuck* count. At that point, the object gets collected by GC or by region
reclamation.

Objects in new space aren't counted. That also helps a lot.

The other confusing issue here is that the fast reference counting
algorithms all rely on deferred updates. Basically, you stick pairs of the
form (overwritten reference, new reference) into a per-CPU append-only
list, and then batch the count updates. If you sort the queues before
applying them, a lot of increments and decrements cancel before being
applied, and a bunch more get run-length compressed. With a bit of care,
you can further reduce the atomicity contention, because the thread
applying the updates need not be the thread that performed the pointer
stores.

There are no count operations on NULL references, so there is no decrement
during initialization or increment during retirement of an object.

So now think of the deferred counting queue as a kind of generational
optimization. During that generation, you can copy a pointer around as much
as you want. As long as you drop it before the generation comes to a close,
the count on the actual object will not change.

Init-only pointers play a *huge* role here. Suppose object A holds an
init-only pointer to B, and I hold a pointer to A. Since the pointer to B
is init-only, we know that lifetime(B) >= lifetime(A). In consequence, any
pointer P having the property that lifetime(P) <= lifetime(A) need not be
reference counted. A lot of those cases can be discovered by purely local
inference. There are other obvious cases for local *copies* of pointers to
A or B. Those cases cover most uses of temporaries, so the number of
reference counts required is rather less than one might initially expect.
In fact, you can probably optimize the counting quite a bit by means of an
optimization that introduces "covering captures", where you introduce a
counted capture that eliminates the need to count a bunch of other captures.


But to be clear, there *will* be issues from the kinds of concerns you
raise, and the need for cycle reclamation and/or stuck-count object
reclamation *does* mean that there is a rate of heap leakage. That means,
in turn, that we need a low-and-slow concurrent background GC, we want some
kind of type-supported mechanism for explicit arenas, and that we need
tools that let us see what's going on.

The whole purpose of the borrowed pointers seem to me to run it in a single
> threaded task to remove this concurrency cost surely there are better ways
> than borrowed pointers and hence being tied to an Actor implimentation..
>

I'm not sure why borrowed pointers tie you to an actor implementation. I *
definitely* think that their utility extends far beyond single-thread uses.


Later you mentioned CIL. If we are talking about reference counts, we
aren't talking about an implementation on CIL.


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to