Re: My Reference Safety System (DIP???)

Zach the Mystic via Digitalmars-d Thu, 26 Feb 2015 17:31:14 -0800

On Thursday, 26 February 2015 at 21:33:53 UTC, Marc Schütz wrote:

On Thursday, 26 February 2015 at 17:56:14 UTC, Zach the Mysticwrote:
On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schützwrote:
struct A {
    B* b;
    ~this() {
        b.doSomething();
    }
}
struct B {
    void doSomething();
}

void foo() {
    A a;      // declscope(1)
    B b;      // declscope(1)
    a.b = &b; // refscope(1) <= declscope(1): OK
    // end of scope:
    // `b` is destroyed
    // `a`'s destructor is called
    // => your calling a method on a destroyed object
}
Basically, every variable needs to get its own declscope; alldeclscopes form a strict hierarchy (no partial overlaps).

Well, technically you only need one per variable with adestructor. Fortunately, this doesn't seem hard to add. Justanother few bits, allowing as many declarations with destructorsas seem necessary (4 bits = 15 variables, 5 bits = 31 variables,etc.), with the last being treated conservatively as unsafe. (Ithink anyone declaring 31+ variables with destructors in afunction, and taking the addresses of those variables has biggerproblems than memory safety!)

I guess this is the "Language versus Legacy" issue. I thinkD's strength is in it's language, not its huge legacycodebase. Therefore, I find myself going with the#pleasebreakourcode crowd, for the sake of extending D's leadwhere it shines.
I'm too, actually, but it would be a really hard sell.

But look, Walter and Andrei were fine with adding `return ref`parameters. There's hope yet!

I'm not sure all references in safe code need to be `scope` -that would break a lot of code unto itself, right?
Not sure how much would be affected. I actually suspect thatmost of it already behaves as if it were scope, with theexception of newly allocated memory. But those should ideallybe "owned" instead.
But your right, there still needs to be an opt-out possibility,most likely static.

I don't even have a use for `scope` itself in my proposal. Theonly risk I'm running is a lot of false positives -- safeconstructs which the detection mechanism conservatively treats asunsafe because it can't follow the program logic. Still, it'shard for me to imagine even these appearing very much. And theycan be put into @trusted lambdas -- all @trusted functions aretreated as if they copy no references, effectively canceling anyparameter attributes to the contrary.

T* fun(T* a, T** b) {
 T* c = new T;
 c = a;
 *b = c;
 return c;
}
Algorithm for inference of ref scopes (= parameter annotations):
1) Each variable, parameter, and the return value get a refscope (or ref depth). A ref scope can either be anothervariable (including `return` and `this`) or `static`.
2) The initial ref scope of variables is themselves.

Actually, no. The *declaration* scope is themselves. The initialref scope is whatever the variable is initialized with, or justnull if nothing. We could even have a bit for "could be null".You might get some null-checking out of this for free. But thenyou'd need more attributes in the signature to indicate "could benull!" But crashing due to null is not considered a safety issue(I think!), so I haven't gone there yet.

3) Each time a variable (or something reachable through avariable) is assigned (returning is assignment to the returnvalue), i.e. for each location in the function that anassignment happens, the new scope ref will be:
3a) the scope of the source, if it is larger or equal to theold scope

If scope depth is >= 1, you inherit the maximum of the source andthe target. If it's 0, you do a bitwise OR on the mystery scopes(unless the compiler can easily prove it doesn't need to), so youcan accumulate all possible origins of the assigned-to scope.

3b) otherwise (for disjunct scopes, or assignment from smallerto larger scope), it is an error (could potentially violateguarantees)

I don't have "disjunct scopes". There's just greater than andless than. The mystery scopes are for figuring out what theparameter attributes are, and in the absence of inference,causing errors in safe code for the parameters not beingaccurately marked.

4) If a source scope refers to a variable (apart from thedestination itself), for which not all assignments have beenprocessed yet, it is put into a queue, to be evaluated later.For code like `a = b; b = a;` there can be dependency cycles.Such code will be disallowed.

No, my system is simpler. I want to make this proposal appealingfrom the implementation side as well as from the language side.You analyze the code in lexical order:


T* dum(T* a) {
  T* b = a; // b accumulates a
  return b; // okay... lexical ordering, b has a only
  T c;
  b = &c; // now b accumulates scopedepth(1);
  return b; // error here, but *only* here
}

The whole process relies on accumulating the scopes as thecompiler encounters them. There are cases of branchingconditional, combined with goto labels, or the beginnings ofloops, where the logical order could be different from thelexical order. Only *these* cases are pushed onto an array andrevisited when the branching conditional is complete. Becauseit's more likely (possibly mathematically certain) to catch allproblems, these statements are "reheated" in reverse order. Myreasoning for this is to keep compiler passes to a minimum, tosave compilation time. In theory, all the scope assignments couldbe traversed again and again, until no scope was left unturned,so to say, but I wanted to come up with something with what youcall an O(1) compilation time.

Honestly, it's almost impossible to say what the tax incompilation time will be until something's implemented (somethingI learned from Walter).

How exactly the scope of a complex expression has to becomputed is left open here.

If you call a function, the return value (if a reference) willhave a scope which can be deduced from the function signature.You inherit the scope of what you pass accordingly, and passthose scopes on to the next function (if you're in a functionchain), or the "out!" parameters, if need be:


T* fun(return T* a, T* b, out!b T** c); // signature only

void gun() {
  T e; // local
  T* f;
  T** g = new T*;
  f = fun(&e, f, g); // f inherits scope of(&e), g inherits f
}

The results of a called function are just inherited as indicatedby the function signature. I don't know what other kinds of"complex expression" you are referring to.

In the end, if there was no error, all variables, parametersand the return value will have a minimum reference scopeassigned. If that scope is the variable itself, they can beinferred as `scope`. If it is a parameter, that parameter getan `out!identifier` or `return` annotation.

The function's final return scope is used to assign "return" tothe parameter attributes for the final function signature, in thecase of attribute inference, and the parameter attributes areused to deduce the return scope when the function is called.

Note that the order in which the "assignments" occur inside thefunction doesn't matter. This is more restrictive than strictlynecessary, but it's certainly ok in most cases, easy to workaround when not, and it doesn't require data/control flowanalysis.

This is different from my proposal. I aim to just go in lexicalorder, with a little extra work done in when lexical order isdetected as possibly being different from the logical order (in aconditional inside a loop).

(By the way: inference cannot work for recursive functions.)

I would like to see a "best effort" approach taken for solvingthe problem of recursive function inference. I think a functionshould be considered "innocent until proven guilty" as regards'pure', for example. It's one of those things which seems likeit's really hard to screw up. How could a function which isotherwise pure become impure just because it calls itself?


T hun(...) {
  [no impure code]
  hun(...);
  [no impure code]
}

I may be wrong, but I can't figure out how this function couldmagically become impure just because it calls itself. The samegoes for the other attributes. And you can use the same trick, ofpushing questionable expressions onto a stack or array, and justrevisiting them at the end of the function to check for attributeviolations. But I admit I don't really understand why attributescan't be inferred with recursive calls in the general case. Maybesomebody can explain to me what I'm missing here.

Your example:

T* fun(T* a, T** b) {
    // => S(a) = a
    // => S(b) = b
    // => S(return) = <doesn't matter>
    T* c; // == (T*).init == null
    // => S(c) = c
    c = new T;
    // `new` returns static, which is wider than c

`c's reference hasn't been assigned until now, so it's neitherwider nor narrower. We're not tracking null references yet, soI'm just treating them like they're global.

    // => S(c) = static
    c = a;
    // => invalid, narrowing not allowed
    // (this is what I asked about, and now I
    // see why it's necessary)

Actually this is fine, I think. Even if `c` inherited somethingnarrower than "new T" (i.e. depth 1), it would be fine, becauseit would now be considered depth(1) and could no longer be copiedto anything with depth <1. It might or might not store a global,but for safety reasons it must now be treated with the narrowestit could possibly have. The error now would be if you copied it*back* to a parameter or a global. (Difference between `c'sdeclaration scope `&c` = (1), and its reference scope = null,until otherwise assigned.)

    // let's assume it didn't happen, so that
    // the next two statements work
    *b = c;
    // => S(b) = S(c) = static
    return c;
    // => S(return) = S(c) = static
}

This would be fine, since your code only has a `new T` and a `T*`parameter copied to c so far. In the case of inference, thefunction now infers: "T fun(return T* a, out!a T** b)". In theabsence of inference, it gives errors on both counts (in @safecode of course, as always). And we're not tracking null yet(which is a different issue), so I won't worry about that. Also,in non-branching code, the compiler could actually know that cwas no longer null at this time.

Something else that needs consideration: What happens whenparameters alias each other? I think it is ok, because thechecking phase will naturally prohibit calling functions in away that would break the guarantees, but I haven't thought itthrough completely.


I'm not sure what you mean. I don't think it's a problem.

I'm actually thinking of reusing `noscope` as a functionattribute (`@noscope` perhaps) which says that the functionmay return a heap or global reference. This is all that'snecessary to complete an ownership system. If a scope hasexactly 1 "mystery" bit set, and is known not to come from theheap or a global, then you know that it *must* contain areference to exactly the parameter for which the mystery bitis set. You know exactly what it contains == ownership.
I will have to think about this, but I believe you cannotexpress such concepts as deadalnix's islands, or "constborrowing". But maybe, if we're lucky, I'm wrong :-)


We'll see!

Re: My Reference Safety System (DIP???)

Reply via email to