**TL;DR**: Strings and sequences, like objects and integers, are copy on
_assignment_. References are not.
Technically, values are always copied when returned from a procedure - there's
not really any other way.* If values were to be always passed by
pointer/reference, how would values that are stored on the function call stack
persist after the procedure has returned?
What differs between the types is _what_ is copied. Though objects, integers,
and references are all copied to the previous procedure frame, the memory
referenced by references is not copied.
What people often get confused about in Nim (and other low-level languages) is
the difference between _return_ and _assignment_ semantics. Assignment
semantics in Nim are quite similar to return semantics, with two** exceptions:
string and sequence types. These types Nim attempts to make semantically
similar to arrays, despite both strings and sequences being references to
dynamically allocated memory. Strings and sequences, like objects, copy their
contents when assigned. This can be demonstrated by the code below:
var a = @[1,2,3]
var b = a
a[0] = 4
echo "A: ", a
echo "B: ", b
Now, before you declare that this is an awful design flaw, I have the following
explanation. For the C backends, the sequence (and string) types are
represented roughly like this:
type
# An array of sequence data dynamically allocated at runtime.
SeqData{.unchecked.}[T] = array[0..0, T]
Sequence[T] = object
len, cap: int
data: SeqData[T]
Since sequences and strings are mutable, their array of data must be
occasionally resized and reallocated when space is needed. Since there is no
guarantee that the reallocated block of memory will have the same pointer, all
references to the old data throughout the entire program's memory must be
updated. Under the current scheme, this is a simple operation. Since sequence
and string types are always copied on assignment, there is always at most one
reference to the old data - the current variable holding the string/sequence.
Other schemes would require tracking all points in memory that a
string/sequence is referenced, which would be difficult (though not impossible,
as most copying-style garbage collectors function like this).
Now, there are ways to circumvent this behavior:
* Using a reference to a sequence or string
* Marking the string or sequence as shallow
* Using the shallowCopy procedure
Of these three the first is the most safe, allowing the string to be resized
while also allowing it to be reference by multiple parts of the program:
type SeqRef[T] = ref seq[T]
var seqr: SeqRef[int]
new(seqr)
seqr[] = @[1, 2, 3]
var seqr2 = seqr
seqr[0] = 4
echo "Address of sequence reference one: ", repr(addr seqr[][0])
echo "Address of sequence reference two: ", repr(addr seqr2[][0])
(This is the scheme used by Python for its list type)
The second and third options involve using shallow operations. Marking a
sequence or string with the
[shallow](http://forum.nim-lang.org///nim-lang.org/docs/system.html#shallow,seq\[T\])
procedure will bypass the usual data-copying behavior for all further
assignments to that sequence, while using the
[shallowCopy](http://forum.nim-lang.org///nim-lang.org/docs/system.html#shallowCopy,T,T)
operator will perform a single assignment operation that bypasses the
behavior.
var a, b, c, d, e: seq[int]
a = @[1,2,3]
# Perform a shallow assignment operation from a to b
shallowCopy(b, a)
# Perform a normal (copying) assignment from b to c
c = b
# Make c shallow, then perform shallow assignments to d and e
shallow(c)
d = c
e = d
echo "a: ", repr(addr a[0])
echo "b: ", repr(addr b[0])
echo "c: ", repr(addr c[0])
echo "d: ", repr(addr d[0])
echo "e: ", repr(addr e[0])
The problem with shallow operations is that once a sequence or string has been
shallowly copied, it _must not_ be modified. If it is, then you will can end up
with some versions of the string that are out-of-sync. When a shallow sequence
is resized, only the variable currently being modified has its reference
updated; the other variables will still have references to the old data. Though
the old data will still persist (so you shouldn't get null reference errors),
this kind of behavior is unpredictable.