On Fri, Oct 11, 2013 at 9:41 PM, Bennie Kloosteman <[email protected]>wrote:
> I havent read the papers ( I will ) but my issue for reference counting > has always been concurency .. eg > > AtomicIncrement(&refPtr) > > if (AtomicDecrement(&refPtr) - == 0 ) > delete > > problem is the atomics become LOCK CMPXCHG > INCL/DECL as you said later. But I don't think so. So first, let me say that I'm not proposing to rely on any *semantics* of reference counting. The model is still that reclamation *conceptually* occurs out of band, and may be delayed. Even with reference counting, it *will* be delayed, because the fast reference counting implementations all rely on deferred updates. In any case, I still want to admit implementations built on conventional GC - reference counting has its own forms of overhead. My first reaction to the paper was much the same as yours: concurrency issues are going to be a pain here. You're thinking about x86, but I'm actually more worried about ARM, where we have weak memory ordering to cope with. But I think that (a) you are looking at the wrong question, and (b) there are *many* factors that offset your concerns. Both of those things can be true, and you may still turn out to be right. I just want to lay out here why there may be some grounds for hope. The reason I say you are asking the wrong question is that this is a qualitatively different kind of contention than reader/writer contention. When an object is accessed concurrently, the majority of pointer operations will be copying pointers from that object into temporaries. The overwhelming majority of those temporaries will be init-only references, which don't require counts. Borrowing more generally is an optimization that hasn't been attempted by the RC-immix work. If those aren't enough, the worst outcome is that hot shared objects will very quickly reach their * stuck* count. At that point, the object gets collected by GC or by region reclamation. Objects in new space aren't counted. That also helps a lot. The other confusing issue here is that the fast reference counting algorithms all rely on deferred updates. Basically, you stick pairs of the form (overwritten reference, new reference) into a per-CPU append-only list, and then batch the count updates. If you sort the queues before applying them, a lot of increments and decrements cancel before being applied, and a bunch more get run-length compressed. With a bit of care, you can further reduce the atomicity contention, because the thread applying the updates need not be the thread that performed the pointer stores. There are no count operations on NULL references, so there is no decrement during initialization or increment during retirement of an object. So now think of the deferred counting queue as a kind of generational optimization. During that generation, you can copy a pointer around as much as you want. As long as you drop it before the generation comes to a close, the count on the actual object will not change. Init-only pointers play a *huge* role here. Suppose object A holds an init-only pointer to B, and I hold a pointer to A. Since the pointer to B is init-only, we know that lifetime(B) >= lifetime(A). In consequence, any pointer P having the property that lifetime(P) <= lifetime(A) need not be reference counted. A lot of those cases can be discovered by purely local inference. There are other obvious cases for local *copies* of pointers to A or B. Those cases cover most uses of temporaries, so the number of reference counts required is rather less than one might initially expect. In fact, you can probably optimize the counting quite a bit by means of an optimization that introduces "covering captures", where you introduce a counted capture that eliminates the need to count a bunch of other captures. But to be clear, there *will* be issues from the kinds of concerns you raise, and the need for cycle reclamation and/or stuck-count object reclamation *does* mean that there is a rate of heap leakage. That means, in turn, that we need a low-and-slow concurrent background GC, we want some kind of type-supported mechanism for explicit arenas, and that we need tools that let us see what's going on. The whole purpose of the borrowed pointers seem to me to run it in a single > threaded task to remove this concurrency cost surely there are better ways > than borrowed pointers and hence being tied to an Actor implimentation.. > I'm not sure why borrowed pointers tie you to an actor implementation. I * definitely* think that their utility extends far beyond single-thread uses. Later you mentioned CIL. If we are talking about reference counts, we aren't talking about an implementation on CIL. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
