I read through the document "Ownership You Can Count On" on which this idea is
based, and it seems to me that the positive results were pretty nebulous, with
memory use just a little bit down compared to as used by GC on the average and
performance not all that different at all, with my point being that neither of
these are our primary problems with current Nim; our problem with Nim is that
GC is increasingly complex and not too reliable as to requiring lots of support
on issues and still doesn't support multi-threading properly, which to add this
capability would increase its complexity and worsen its execution speed. Thus,
I commend the objective but see that the old paper specifically says
"Currently, the major limitation of Gel is that its reference counting
optimizations can not easily be applied in the presence of aliasing arising
under multi-threading. As a result, Gel currently only supports single-threaded
programs.", which is further developed down the text with no ideal solutions
that don't introduce conventional reference counting; I would see that as a
"stop" stamp on the project from the "get-go" \- very difficult to overcome one
of the primary reasons we need a better memory manager.
So, let's look at other memory management systems other than GC that (kind of)
work.
Swift uses automatic reference counting and likely does some of the
optimizations envisioned as essentially the references are "owned", but its
current implementation is flawed - it defers all "dispose" operations to when
the reference goes out of scope by building a deferred execution stack
structure, but then can suffer a huge execution overhead at the end of the
scope for disposal of nested structures such as a linked list, even to the
point of causing a stack overflow due to these (stacked) deferred
de-allocations - this makes de-allocation deterministic but deferred. I filed
an issue but its not resolved yet.
>From Rust, C++, and most other solutions, the solution to solving Memory
>Management of pointers for shared memory is reference counting. The ownership
>concept such as proposed here can improve efficiency within a particular
>scoped basis or series of nested scopes but doesn't really work when
>references need to be maintained by the creator and a completely independent
>"escaped" use such as a thread - one can't just pass the ownership and expect
>to get it back as there may be no mechanism to return the ownership and the
>thread returns something else through a different channel than a simple
>return. To that end, the concept of having owned pointers for closures also
>doesn't work when we want to be able to pass closures to threads or outside of
>their scope ("escaping"). The Nim compiler already considers the concept of
>closure escape as it either puts the environment on the stack for non-escaping
>closures and on the ref heap when they may be escaping; I think we need the
>compiler to add another case where they need to be put on the shared heap when
>they will be used on other threads,, even though that makes them slower to
>create and dispose.
So, to me, it looks like other than successful (big) GC's like Haskell (which
has it easier in that there are mutability constraints), the best working model
for (semi) automatic memory management is C++'s: RC for multi-threading and
cases where unique/owned doesn't work, with weak references for circular
dependencies. The main problems with RC is usually said to be its execution
overhead, so let's write a simple benchmark to show its worst case execution as
compared to the current GC. The following program creates a seq with a million
references to a ref object and tests the time to create and dispose of all the
RC'ed links as compared to the current GC:
# imports used for tim
from times import epochTime
from math import floor
# imports to do with shared barriers/critical sections
from locks import Lock, initLock, deinitLock, withLock
let NUMITERS = 1000000
type
Dummy = ref object
cntnts: int
let dummy = Dummy(cntnts: 1) # a single instance of the ref object
GC_fullcollect() # to start with an empty heap...
var tstseq = newSeq[Dummy](NUMITERS) # created outside the loop
var start = epochTime()
for i in 0 .. NUMITERS - 1: tstseq[i] = dummy # a bunch of references
var count = 0
for i in 0 .. NUMITERS - 1: count += tstseq[i].cntnts # do something
GC_fullcollect() # to time all accumulated deallocations
var elpsd = ((epochTime() - start) * 100000.0).floor / 100.0
echo "Using GC: Found ", count, " in ", elpsd, " milliseconds."
type
PtrRC[T] = object
pntr: ptr T
refcnt: int
proc makePtrRC[T](p: ptr T): PtrRC[T] =
result = PtrRC[T](pntr: p, refcnt: 0)
proc clone[T](ptrrc: var PtrRC[T]): PtrRC[T] =
ptrrc.refcnt += 1
ptrrc
proc dispose[T](ptrrc: var PtrRC[T]) =
if ptrrc.refcnt < 2: ptrrc.pntr.dealloc; ptrrc.pntr = nil
ptrrc.refcnt -= 1
proc `[]`[T](ptrrc: PtrRC[T]): T = ptrrc.pntr[]
type
DummyObj = object
cntnts: int
proc makeDummyPtr(c: int): ptr DummyObj =
result = createU(DummyObj, 1); result.cntnts = c
var dummyPtrRC = makePtrRC(makeDummyPtr(1))
var tstseqrc = newSeq[PtrRC[DummyObj]](NUMITERS) # created outside the loop
start = epochTime()
for i in 0 .. NUMITERS - 1:
tstseqrc[i] = dummyPtrRC.clone # a bunch of references
count = 0
for i in 0 .. NUMITERS - 1: count += tstseqrc[i][].cntnts # do something
for i in 0 .. NUMITERS - 1: tstseqrc[i].dispose # dispose all
elpsd = ((epochTime() - start) * 100000.0).floor / 100.0
echo "Using non-sharing RC: Found ", count, " in ", elpsd, " milliseconds."
type
PtrSharedRC[T] = object
pntr: ptr T
refcnt: int
lckr: Lock
proc makePtrSharedRC[T](p: ptr T): PtrSharedRC[T] =
result = PtrSharedRC[T](pntr: p, refcnt: 0); result.lckr.initLock
proc clone[T](ptrrc: var PtrSharedRC[T]): PtrSharedRC[T] =
ptrrc.refcnt.atomicInc; ptrrc
proc dispose[T](ptrrc: var PtrSharedRC[T]) =
if ptrrc.refcnt < 2: ptrrc.pntr.freeShared; ptrrc.pntr = nil
else: ptrrc.refcnt.atomicDec
proc `[]`[T](ptrrc: PtrSharedRC[T]): T = ptrrc.pntr[]
proc makeDummySharedPtr(c: int): ptr DummyObj =
result = createSharedU(DummyObj, 1); result.cntnts = c
var dummySharedPtrRC = makePtrSharedRC(makeDummySharedPtr(1))
var tstseqshrdrc = newSeq[PtrSharedRC[DummyObj]](NUMITERS) # created
outside the loop
start = epochTime()
for i in 0 .. NUMITERS - 1:
tstseqshrdrc[i] = dummySharedPtrRC.clone # a bunch of references
count = 0
for i in 0 .. NUMITERS - 1: count += tstseqshrdrc[i][].cntnts # do something
for i in 0 .. NUMITERS - 1: tstseqshrdrc[i].dispose # dispose
elpsd = ((epochTime() - start) * 100000.0).floor / 100.0
echo "Using sharing RC: Found ", count, " in ", elpsd, " milliseconds."
Run
As can be seen in [this Wandbox implementation of the
above](https://wandbox.org/permlink/j7lW5FuN2CuLoNb4), the execution speed of
this naive RC'd solution is actually faster than the GC'ed version for
non-threaded/non-atomic implementations, but over twice as slow when atomic
reference counting is added; this explains why Rust has two implementations
from which one chooses as appropriate: non-atomic RC for normal use and atomic
RC when threaded, with Rust's concept of lifetimes and ownership taking care of
"unique pointers"..
I see that this application of this "owned ref" idea would make the normal
non-shared/non-atomic version a little faster, but that isn't really our
problem as that is already faster than the GC, speed isn't our problem, and it
doesn't help multi-threading at all. So why put so much effort into it?
Why not use the C++ approach, "unique" ref's where they can be used efficiently
and "shared" RC'ed ref's when it doesn't really work, with the emphasis on
getting the RC'ed version out until the "owned" version can be tested and
tweaked for some extra efficiency where warranted?
I know others are working on solutions based on SmartPointers as in Unique and
RC's so I won't go further into that other than to say that proposal would seem
to be more worth our time than this one, which doesn't seem to offer solutions
to our main problems.