On Tuesday, 4 February 2014 at 22:30:39 UTC, Walter Bright wrote:
I wonder how Rust deals with this.

The only time ownership matters is if you are going to store the pointer. It is like the difference between a container and a range.

An algorithm doesn't need to know about the specifics of a container. Let's use average for example. We might write it in D:

int average(InputRange)(InputRange r) {
    int count = 0;
    int sum;
    while(!r.empty) {
         count++;
         sum += r.front;
         r.popFront();
    }
    return sum / count;
}

Now, this being a template, D will generate new code for a variety of types... but even if we replaced InputRange with a specific thing, let's call it int[], it is still usable by a variety of containers:

int average(int[] r) { /* same impl */ }


D has two containers built in that provide this range:

int[50] staticArray;
int[] dynamicArray = new int[](50);

average(staticArray[]); // works
average(dynamicArray); // works

Pointers also offer this:

int* pointer = cast(int*) malloc(50 * int.sizeof);
average(pointer[0 .. 50]);



Moreover, user-defined types can also provide this range:

struct Numbers {
    int[] opSlice() { return [1,2,3]; }
}

Numbers numbers;
average(numbers[]); // works

In theory, we could provide either an inputRangeObject or a slice for linked lists, lazy generators, anything. One function, any kind of input.


Of course, we could slice memory from any allocator. Heck, we saw three different allocations right here (with three different types! stack, gc, and malloc) all using the same function, without templating.




I'm sure none of this is new to you... and this is basically how the rust thing works too. Our usage of int[] (or the input range) are borrowed pointers. Algorithms are written in their terms.

The ownership type only matters when you store it. And turns out, this matters in D as well:

struct ManualArray(T) {
    size_t length;
    T* data;

this(size_t len) { data = malloc(T.sizeof * len); length = len; }
    ~this() { free(data); }
    T[] opSlice() { return data[0 .. length]; }
    @disable this(this); // copying this is wrong, don't allow it!
}

void main() {
    auto array = ManualArray!int(50);
    average(array[]); // works, reusing our pointer
}


But, borrowed comes into play if we store it:

int[] globalArray;
void foo(int[] array) {
    globalArray = array;
}

void bar() {
    auto array = ManualArray!int(50);
    foo(array[]); // uh oh
}

void main() {
   bar();
   globalArray[0] = 10; // crash likely, memory safety violated
}



Again, I'm sure none of this is new to you, but it illustrates owned vs borrowed: ManualArray is owned. Storing it is safe - it ensures its internal pointer is valid throughout its entire life time.

But ManualArray.opSlice returns a borrowed reference. Great for algorithms or any processing that doesn't escape the reference. Anything that would be written in terms of an input range is probably correct with this.

However, we stored the borrowed reference, which is a no-no. array went out of scope, freeing the memory, leaving the escaped borrowed reference in an invalid state.


Let's say we did want to store it. There's a few options: we could make our own copy or store the pre-made copy.

GC!(int[]) globalArray;
void foo(GC!(int[]) array) { globalArray = array; }


That's sane, the GC owns it and we specified that so storing it is cool.

We could also take a RefCounted!(int[]), if that's how we wanted to store it.


But let's say we wanted to store it with a different method. There's only two sane options:


void foo(int[] array) { globalArray = array.dup; }

Take a borrowed reference and make a copy of it. The function foo is in charge of allocating (here, we made a GC managed copy).


OR, don't implement that and force the user to decide:


void foo(GC!(int[]) array) {...}


user:

foo(ownedArray[]); // error, cannot implicitly convert int[] to GC!(int[])
int[50] stackArray;
foo(stackArray[]); // error, cannot implicitly convert int[] to GC!int[]


Now, the user makes the decision. It is going to be stored, the function signature says that up front by asking for a non-borrowed reference. They won't get a surprise crash when the globalArray later accesses stack or freed data. They have to deal with the error. They might not call the function, or they might do the .dup themselves. Either way, memory safety is preserved and inefficiencies are visible.



So, a function that stores a reference would only ever come in one or two signatures, regardless of how many:

1) the exact match for the callee's allocation strategy. The callee, knowing what the strategy is, can also be sanely responsible for freeing it. (A struct dtor, for example, knows that its members are malloced and can thus call free)

2) A generic borrowed type, e.g. input range or slice, which it then makes a private copy of it internally. Since these are arguably hidden allocations you might not even like these. Calling .dup (or whatever) at the call sight keeps the allocations visible.




So bottom line, you don't duplicate functions for the different types. You borrow references for processing (analogous to implementing algorithms with ranges) and own references for storing... which you need to know about, so only one type makes sense.

Reply via email to