On Thursday, 26 February 2015 at 21:33:53 UTC, Marc Schütz wrote:
On Thursday, 26 February 2015 at 17:56:14 UTC, Zach the Mystic wrote:
On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz wrote:
struct A {
    B* b;
    ~this() {
        b.doSomething();
    }
}

struct B {
    void doSomething();
}

void foo() {
    A a;      // declscope(1)
    B b;      // declscope(1)
    a.b = &b; // refscope(1) <= declscope(1): OK
    // end of scope:
    // `b` is destroyed
    // `a`'s destructor is called
    // => your calling a method on a destroyed object
}

Basically, every variable needs to get its own declscope; all declscopes form a strict hierarchy (no partial overlaps).

Well, technically you only need one per variable with a destructor. Fortunately, this doesn't seem hard to add. Just another few bits, allowing as many declarations with destructors as seem necessary (4 bits = 15 variables, 5 bits = 31 variables, etc.), with the last being treated conservatively as unsafe. (I think anyone declaring 31+ variables with destructors in a function, and taking the addresses of those variables has bigger problems than memory safety!)

I guess this is the "Language versus Legacy" issue. I think D's strength is in it's language, not its huge legacy codebase. Therefore, I find myself going with the #pleasebreakourcode crowd, for the sake of extending D's lead where it shines.

I'm too, actually, but it would be a really hard sell.

But look, Walter and Andrei were fine with adding `return ref` parameters. There's hope yet!

I'm not sure all references in safe code need to be `scope` - that would break a lot of code unto itself, right?

Not sure how much would be affected. I actually suspect that most of it already behaves as if it were scope, with the exception of newly allocated memory. But those should ideally be "owned" instead.

But your right, there still needs to be an opt-out possibility, most likely static.

I don't even have a use for `scope` itself in my proposal. The only risk I'm running is a lot of false positives -- safe constructs which the detection mechanism conservatively treats as unsafe because it can't follow the program logic. Still, it's hard for me to imagine even these appearing very much. And they can be put into @trusted lambdas -- all @trusted functions are treated as if they copy no references, effectively canceling any parameter attributes to the contrary.

T* fun(T* a, T** b) {
 T* c = new T;
 c = a;
 *b = c;
 return c;
}

Algorithm for inference of ref scopes (= parameter annotations):

1) Each variable, parameter, and the return value get a ref scope (or ref depth). A ref scope can either be another variable (including `return` and `this`) or `static`.

2) The initial ref scope of variables is themselves.

Actually, no. The *declaration* scope is themselves. The initial ref scope is whatever the variable is initialized with, or just null if nothing. We could even have a bit for "could be null". You might get some null-checking out of this for free. But then you'd need more attributes in the signature to indicate "could be null!" But crashing due to null is not considered a safety issue (I think!), so I haven't gone there yet.

3) Each time a variable (or something reachable through a variable) is assigned (returning is assignment to the return value), i.e. for each location in the function that an assignment happens, the new scope ref will be:

3a) the scope of the source, if it is larger or equal to the old scope

If scope depth is >= 1, you inherit the maximum of the source and the target. If it's 0, you do a bitwise OR on the mystery scopes (unless the compiler can easily prove it doesn't need to), so you can accumulate all possible origins of the assigned-to scope.

3b) otherwise (for disjunct scopes, or assignment from smaller to larger scope), it is an error (could potentially violate guarantees)

I don't have "disjunct scopes". There's just greater than and less than. The mystery scopes are for figuring out what the parameter attributes are, and in the absence of inference, causing errors in safe code for the parameters not being accurately marked.

4) If a source scope refers to a variable (apart from the destination itself), for which not all assignments have been processed yet, it is put into a queue, to be evaluated later. For code like `a = b; b = a;` there can be dependency cycles. Such code will be disallowed.

No, my system is simpler. I want to make this proposal appealing from the implementation side as well as from the language side. You analyze the code in lexical order:

T* dum(T* a) {
  T* b = a; // b accumulates a
  return b; // okay... lexical ordering, b has a only
  T c;
  b = &c; // now b accumulates scopedepth(1);
  return b; // error here, but *only* here
}

The whole process relies on accumulating the scopes as the compiler encounters them. There are cases of branching conditional, combined with goto labels, or the beginnings of loops, where the logical order could be different from the lexical order. Only *these* cases are pushed onto an array and revisited when the branching conditional is complete. Because it's more likely (possibly mathematically certain) to catch all problems, these statements are "reheated" in reverse order. My reasoning for this is to keep compiler passes to a minimum, to save compilation time. In theory, all the scope assignments could be traversed again and again, until no scope was left unturned, so to say, but I wanted to come up with something with what you call an O(1) compilation time.

Honestly, it's almost impossible to say what the tax in compilation time will be until something's implemented (something I learned from Walter).

How exactly the scope of a complex expression has to be computed is left open here.

If you call a function, the return value (if a reference) will have a scope which can be deduced from the function signature. You inherit the scope of what you pass accordingly, and pass those scopes on to the next function (if you're in a function chain), or the "out!" parameters, if need be:

T* fun(return T* a, T* b, out!b T** c); // signature only

void gun() {
  T e; // local
  T* f;
  T** g = new T*;
  f = fun(&e, f, g); // f inherits scope of(&e), g inherits f
}

The results of a called function are just inherited as indicated by the function signature. I don't know what other kinds of "complex expression" you are referring to.

In the end, if there was no error, all variables, parameters and the return value will have a minimum reference scope assigned. If that scope is the variable itself, they can be inferred as `scope`. If it is a parameter, that parameter get an `out!identifier` or `return` annotation.

The function's final return scope is used to assign "return" to the parameter attributes for the final function signature, in the case of attribute inference, and the parameter attributes are used to deduce the return scope when the function is called.

Note that the order in which the "assignments" occur inside the function doesn't matter. This is more restrictive than strictly necessary, but it's certainly ok in most cases, easy to work around when not, and it doesn't require data/control flow analysis.

This is different from my proposal. I aim to just go in lexical order, with a little extra work done in when lexical order is detected as possibly being different from the logical order (in a conditional inside a loop).

(By the way: inference cannot work for recursive functions.)

I would like to see a "best effort" approach taken for solving the problem of recursive function inference. I think a function should be considered "innocent until proven guilty" as regards 'pure', for example. It's one of those things which seems like it's really hard to screw up. How could a function which is otherwise pure become impure just because it calls itself?

T hun(...) {
  [no impure code]
  hun(...);
  [no impure code]
}

I may be wrong, but I can't figure out how this function could magically become impure just because it calls itself. The same goes for the other attributes. And you can use the same trick, of pushing questionable expressions onto a stack or array, and just revisiting them at the end of the function to check for attribute violations. But I admit I don't really understand why attributes can't be inferred with recursive calls in the general case. Maybe somebody can explain to me what I'm missing here.

Your example:

T* fun(T* a, T** b) {
    // => S(a) = a
    // => S(b) = b
    // => S(return) = <doesn't matter>
    T* c; // == (T*).init == null
    // => S(c) = c
    c = new T;
    // `new` returns static, which is wider than c

`c's reference hasn't been assigned until now, so it's neither wider nor narrower. We're not tracking null references yet, so I'm just treating them like they're global.

    // => S(c) = static
    c = a;
    // => invalid, narrowing not allowed
    // (this is what I asked about, and now I
    // see why it's necessary)

Actually this is fine, I think. Even if `c` inherited something narrower than "new T" (i.e. depth 1), it would be fine, because it would now be considered depth(1) and could no longer be copied to anything with depth <1. It might or might not store a global, but for safety reasons it must now be treated with the narrowest it could possibly have. The error now would be if you copied it *back* to a parameter or a global. (Difference between `c's declaration scope `&c` = (1), and its reference scope = null, until otherwise assigned.)

    // let's assume it didn't happen, so that
    // the next two statements work
    *b = c;
    // => S(b) = S(c) = static
    return c;
    // => S(return) = S(c) = static
}

This would be fine, since your code only has a `new T` and a `T*` parameter copied to c so far. In the case of inference, the function now infers: "T fun(return T* a, out!a T** b)". In the absence of inference, it gives errors on both counts (in @safe code of course, as always). And we're not tracking null yet (which is a different issue), so I won't worry about that. Also, in non-branching code, the compiler could actually know that c was no longer null at this time.

Something else that needs consideration: What happens when parameters alias each other? I think it is ok, because the checking phase will naturally prohibit calling functions in a way that would break the guarantees, but I haven't thought it through completely.

I'm not sure what you mean. I don't think it's a problem.

I'm actually thinking of reusing `noscope` as a function attribute (`@noscope` perhaps) which says that the function may return a heap or global reference. This is all that's necessary to complete an ownership system. If a scope has exactly 1 "mystery" bit set, and is known not to come from the heap or a global, then you know that it *must* contain a reference to exactly the parameter for which the mystery bit is set. You know exactly what it contains == ownership.

I will have to think about this, but I believe you cannot express such concepts as deadalnix's islands, or "const borrowing". But maybe, if we're lucky, I'm wrong :-)

We'll see!

Reply via email to