**TL;DR**: Strings and sequences, like objects and integers, are copy on 
_assignment_. References are not.

Technically, values are always copied when returned from a procedure - there's 
not really any other way.* If values were to be always passed by 
pointer/reference, how would values that are stored on the function call stack 
persist after the procedure has returned?

What differs between the types is _what_ is copied. Though objects, integers, 
and references are all copied to the previous procedure frame, the memory 
referenced by references is not copied.

What people often get confused about in Nim (and other low-level languages) is 
the difference between _return_ and _assignment_ semantics. Assignment 
semantics in Nim are quite similar to return semantics, with two** exceptions: 
string and sequence types. These types Nim attempts to make semantically 
similar to arrays, despite both strings and sequences being references to 
dynamically allocated memory. Strings and sequences, like objects, copy their 
contents when assigned. This can be demonstrated by the code below: 
    
    
    var a = @[1,2,3]
    var b = a
    
    a[0] = 4
    echo "A: ", a
    echo "B: ", b
    

Now, before you declare that this is an awful design flaw, I have the following 
explanation. For the C backends, the sequence (and string) types are 
represented roughly like this: 
    
    
    type
      # An array of sequence data dynamically allocated at runtime.
      SeqData{.unchecked.}[T] = array[0..0, T]
      Sequence[T] = object
        len, cap: int
        data: SeqData[T]
    

Since sequences and strings are mutable, their array of data must be 
occasionally resized and reallocated when space is needed. Since there is no 
guarantee that the reallocated block of memory will have the same pointer, all 
references to the old data throughout the entire program's memory must be 
updated. Under the current scheme, this is a simple operation. Since sequence 
and string types are always copied on assignment, there is always at most one 
reference to the old data - the current variable holding the string/sequence. 
Other schemes would require tracking all points in memory that a 
string/sequence is referenced, which would be difficult (though not impossible, 
as most copying-style garbage collectors function like this).

Now, there are ways to circumvent this behavior:
    

  * Using a reference to a sequence or string
  * Marking the string or sequence as shallow
  * Using the shallowCopy procedure



Of these three the first is the most safe, allowing the string to be resized 
while also allowing it to be reference by multiple parts of the program: 
    
    
    type SeqRef[T] = ref seq[T]
    
    var seqr: SeqRef[int]
    new(seqr)
    
    seqr[] = @[1, 2, 3]
    var seqr2 = seqr
    
    seqr[0] = 4
    echo "Address of sequence reference one: ", repr(addr seqr[][0])
    echo "Address of sequence reference two: ", repr(addr seqr2[][0])
    

(This is the scheme used by Python for its list type)

The second and third options involve using shallow operations. Marking a 
sequence or string with the 
[shallow](http://forum.nim-lang.org///nim-lang.org/docs/system.html#shallow,seq\[T\])
 procedure will bypass the usual data-copying behavior for all further 
assignments to that sequence, while using the 
[shallowCopy](http://forum.nim-lang.org///nim-lang.org/docs/system.html#shallowCopy,T,T)
 operator will perform a single assignment operation that bypasses the 
behavior. 
    
    
    var a, b, c, d, e: seq[int]
    a = @[1,2,3]
    
    # Perform a shallow assignment operation from a to b
    shallowCopy(b, a)
    
    # Perform a normal (copying) assignment from b to c
    c = b
    
    # Make c shallow, then perform shallow assignments to d and e
    shallow(c)
    d = c
    e = d
    
    echo "a: ", repr(addr a[0])
    echo "b: ", repr(addr b[0])
    echo "c: ", repr(addr c[0])
    echo "d: ", repr(addr d[0])
    echo "e: ", repr(addr e[0])
    

The problem with shallow operations is that once a sequence or string has been 
shallowly copied, it _must not_ be modified. If it is, then you will can end up 
with some versions of the string that are out-of-sync. When a shallow sequence 
is resized, only the variable currently being modified has its reference 
updated; the other variables will still have references to the old data. Though 
the old data will still persist (so you shouldn't get null reference errors), 
this kind of behavior is unpredictable.

Reply via email to