I read through the document "Ownership You Can Count On" on which this idea is 
based, and it seems to me that the positive results were pretty nebulous, with 
memory use just a little bit down compared to as used by GC on the average and 
performance not all that different at all, with my point being that neither of 
these are our primary problems with current Nim; our problem with Nim is that 
GC is increasingly complex and not too reliable as to requiring lots of support 
on issues and still doesn't support multi-threading properly, which to add this 
capability would increase its complexity and worsen its execution speed. Thus, 
I commend the objective but see that the old paper specifically says 
"Currently, the major limitation of Gel is that its reference counting 
optimizations can not easily be applied in the presence of aliasing arising 
under multi-threading. As a result, Gel currently only supports single-threaded 
programs.", which is further developed down the text with no ideal solutions 
that don't introduce conventional reference counting; I would see that as a 
"stop" stamp on the project from the "get-go" \- very difficult to overcome one 
of the primary reasons we need a better memory manager.

So, let's look at other memory management systems other than GC that (kind of) 
work.

Swift uses automatic reference counting and likely does some of the 
optimizations envisioned as essentially the references are "owned", but its 
current implementation is flawed - it defers all "dispose" operations to when 
the reference goes out of scope by building a deferred execution stack 
structure, but then can suffer a huge execution overhead at the end of the 
scope for disposal of nested structures such as a linked list, even to the 
point of causing a stack overflow due to these (stacked) deferred 
de-allocations - this makes de-allocation deterministic but deferred. I filed 
an issue but its not resolved yet.

>From Rust, C++, and most other solutions, the solution to solving Memory 
>Management of pointers for shared memory is reference counting. The ownership 
>concept such as proposed here can improve efficiency within a particular 
>scoped basis or series of nested scopes but doesn't really work when 
>references need to be maintained by the creator and a completely independent 
>"escaped" use such as a thread - one can't just pass the ownership and expect 
>to get it back as there may be no mechanism to return the ownership and the 
>thread returns something else through a different channel than a simple 
>return. To that end, the concept of having owned pointers for closures also 
>doesn't work when we want to be able to pass closures to threads or outside of 
>their scope ("escaping"). The Nim compiler already considers the concept of 
>closure escape as it either puts the environment on the stack for non-escaping 
>closures and on the ref heap when they may be escaping; I think we need the 
>compiler to add another case where they need to be put on the shared heap when 
>they will be used on other threads,, even though that makes them slower to 
>create and dispose.

So, to me, it looks like other than successful (big) GC's like Haskell (which 
has it easier in that there are mutability constraints), the best working model 
for (semi) automatic memory management is C++'s: RC for multi-threading and 
cases where unique/owned doesn't work, with weak references for circular 
dependencies. The main problems with RC is usually said to be its execution 
overhead, so let's write a simple benchmark to show its worst case execution as 
compared to the current GC. The following program creates a seq with a million 
references to a ref object and tests the time to create and dispose of all the 
RC'ed links as compared to the current GC:
    
    
    # imports used for tim
    from times import epochTime
    from math import floor
    
    # imports to do with shared barriers/critical sections
    from locks import Lock, initLock, deinitLock, withLock
    
    let NUMITERS = 1000000
    
    type
      Dummy = ref object
        cntnts: int
    let dummy = Dummy(cntnts: 1) # a single instance of the ref object
    
    GC_fullcollect() # to start with an empty heap...
    
    var tstseq = newSeq[Dummy](NUMITERS) # created outside the loop
    var start = epochTime()
    for i in 0 .. NUMITERS - 1: tstseq[i] = dummy # a bunch of references
    var count = 0
    for i in 0 .. NUMITERS - 1: count += tstseq[i].cntnts # do something
    GC_fullcollect() # to time all accumulated deallocations
    var elpsd = ((epochTime() - start) * 100000.0).floor / 100.0
    echo "Using GC:  Found ", count, " in ", elpsd, " milliseconds."
    
    type
      PtrRC[T] = object
        pntr: ptr T
        refcnt: int
    proc makePtrRC[T](p: ptr T): PtrRC[T] =
      result = PtrRC[T](pntr: p, refcnt: 0)
    proc clone[T](ptrrc: var PtrRC[T]): PtrRC[T] =
      ptrrc.refcnt += 1
      ptrrc
    proc dispose[T](ptrrc: var PtrRC[T]) =
      if ptrrc.refcnt < 2: ptrrc.pntr.dealloc; ptrrc.pntr = nil
      ptrrc.refcnt -= 1
    proc `[]`[T](ptrrc: PtrRC[T]): T = ptrrc.pntr[]
    
    type
      DummyObj = object
        cntnts: int
    proc makeDummyPtr(c: int): ptr DummyObj =
      result = createU(DummyObj, 1); result.cntnts = c
    var dummyPtrRC = makePtrRC(makeDummyPtr(1))
    
    var tstseqrc = newSeq[PtrRC[DummyObj]](NUMITERS) # created outside the loop
    start = epochTime()
    for i in 0 .. NUMITERS - 1:
      tstseqrc[i] = dummyPtrRC.clone # a bunch of references
    count = 0
    for i in 0 .. NUMITERS - 1: count += tstseqrc[i][].cntnts # do something
    for i in 0 .. NUMITERS - 1: tstseqrc[i].dispose # dispose all
    elpsd = ((epochTime() - start) * 100000.0).floor / 100.0
    echo "Using non-sharing RC:  Found ", count, " in ", elpsd, " milliseconds."
    
    type
      PtrSharedRC[T] = object
        pntr: ptr T
        refcnt: int
        lckr: Lock
    proc makePtrSharedRC[T](p: ptr T): PtrSharedRC[T] =
      result = PtrSharedRC[T](pntr: p, refcnt: 0); result.lckr.initLock
    proc clone[T](ptrrc: var PtrSharedRC[T]): PtrSharedRC[T] =
      ptrrc.refcnt.atomicInc; ptrrc
    proc dispose[T](ptrrc: var PtrSharedRC[T]) =
      if ptrrc.refcnt < 2: ptrrc.pntr.freeShared; ptrrc.pntr = nil
      else: ptrrc.refcnt.atomicDec
    proc `[]`[T](ptrrc: PtrSharedRC[T]): T = ptrrc.pntr[]
    
    proc makeDummySharedPtr(c: int): ptr DummyObj =
      result = createSharedU(DummyObj, 1); result.cntnts = c
    var dummySharedPtrRC = makePtrSharedRC(makeDummySharedPtr(1))
    
    var tstseqshrdrc = newSeq[PtrSharedRC[DummyObj]](NUMITERS) # created 
outside the loop
    start = epochTime()
    for i in 0 .. NUMITERS - 1:
      tstseqshrdrc[i] = dummySharedPtrRC.clone # a bunch of references
    count = 0
    for i in 0 .. NUMITERS - 1: count += tstseqshrdrc[i][].cntnts # do something
    for i in 0 .. NUMITERS - 1: tstseqshrdrc[i].dispose # dispose
    elpsd = ((epochTime() - start) * 100000.0).floor / 100.0
    echo "Using sharing RC:  Found ", count, " in ", elpsd, " milliseconds."
    
    
    Run

As can be seen in [this Wandbox implementation of the 
above](https://wandbox.org/permlink/j7lW5FuN2CuLoNb4), the execution speed of 
this naive RC'd solution is actually faster than the GC'ed version for 
non-threaded/non-atomic implementations, but over twice as slow when atomic 
reference counting is added; this explains why Rust has two implementations 
from which one chooses as appropriate: non-atomic RC for normal use and atomic 
RC when threaded, with Rust's concept of lifetimes and ownership taking care of 
"unique pointers"..

I see that this application of this "owned ref" idea would make the normal 
non-shared/non-atomic version a little faster, but that isn't really our 
problem as that is already faster than the GC, speed isn't our problem, and it 
doesn't help multi-threading at all. So why put so much effort into it?

Why not use the C++ approach, "unique" ref's where they can be used efficiently 
and "shared" RC'ed ref's when it doesn't really work, with the emphasis on 
getting the RC'ed version out until the "owned" version can be tested and 
tweaked for some extra efficiency where warranted?

I know others are working on solutions based on SmartPointers as in Unique and 
RC's so I won't go further into that other than to say that proposal would seem 
to be more worth our time than this one, which doesn't seem to offer solutions 
to our main problems.

Reply via email to