Re: Escape analysis (full scope analysis proposal)

Michel Fortin Fri, 14 Nov 2008 04:50:28 -0800

On 2008-11-13 00:53:50 -0500, Andrei Alexandrescu<[EMAIL PROTECTED]> said:

Michel Fortin wrote:
Everywhere I said there was no need for named regions, I also saidnamed regions could be kept to ease the syntax. That said, I'm not sosure named regions are that good at simplifying the syntax. In yourassign example above, the named-region version has an error: it forcesthe two pointers to be of the same region. That could be fine, but,assuming you're assigning to *p, it'd be more precise to write it likethat:
    void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);
No, the code is correct as written (without the if). You may want toreread the paper with an eye for region subtyping rules. This partlybacks up my point: understanding region analysis may be quite a burdenfor the average programmer. Even you, who took pains to think througheverything and absorb the paper, are having trouble. And me too to behonest :o).

Ok, I've reread that part and it's true that using Cyclone's subtypingrules it'd work fine with only one region name because Cycloneimplicitly creates two regions from that, the first being a subset ofthe other, just as I wrote explicitly here. But what I missed out wasone of Cyclone's syntactic construct, not a concept of regions.

... or perhaps we have a different notion of what is a syntax and whatis a concept?

Once we get there, I think the no-named region syntax is better.


This is invalidated by the wrong assertion above.

Yes and no. It's true that Cyclone's region subtyping makes the syntaxprettier. On the other side, the programmer has to be aware of how itworks, and especially aware that changing the order his arguments willimplicitly change the region relationship between them.

That said, for the swap example, where both values need to share thesame region, the named region notation is simpler:
    void swap(region R)(int*R a, int*R b);
    void swap(int* a, int* b) if (region(a) == region(b));
No, for that swap there is no need to specify any region. You can swapints in any two regions. Probably you meant to use int** throughout.


Hum, you're right, I meant to make these "ref int*".

But I'd argue that most of the time regions do not need to be equal,but are subset or superset of each other, so reusing variable namesmakes more sense in my opinion.
Don't forget that using a region name twice may actually work with twodifferent regions, so far as they are in a subtyping relationship.Region subtyping is key to both simplifying code and to understandingcode after simplification.

I'm not convinced that region subtyping is so simple to understand forneophytes, especially because you may assume the same region at firstglance. Cyclone isn't C++, but this region subtyping rule makes methink of one of those many little known corners in C++ such as Koenigname lookup.

But I consider this just a syntactic issue about how to express regionsthough. And I may be completely wrong about its unintuitiveness.

In any case, I prefer a notation where regions constrains are attacheddirectly to the type instead of being expressed somewhere else.Something like this (explained below):
    void assign(int*(r)* p, int* r) { *p = r; }
    void swap(ref int*(b) a, ref int*(a) b);
Sure. I'm sure there's understanding that that doesn't make anythingany simpler or any easier to implement or understand. It's just a minorchange in notation, and IMHO not to the better.

Ok, then we disagree here. I think this notation is better because itmakes you think about things in term of pointer lifetime vs. thepointed data lifetime, which I think is much less abstract thanvariables being part of different regions where some regions encompassother regions. It's a shift in perspective from the syntactic approachof Cyclone, although under the hood the compiler would do mostly thesame work.

Here, a parenthesis suffix after a pointer indicates the regionconstrain of the pointer, based on the region of another pointer.
I thought it means pointer to function. Oops.

And I though the syntax was the least of your concern right now? :-)This probably can't be the final syntax, but I think it makes thingsclear enough talk about about the concepts... for now.

In the first example, int*(r)* means that the integer pointer "*p" mustnot live beyond the value pointed by "r" (because we're going to assign"r" to "*p"). In the second example, the value pointed by "a" must notlive longer than the one pointed by "b" and the value pointed by "b"must not live longer than the one pointed "a"; the net result is thatthey must have the same lifetime and need to be in the same region.
For something more complicated, you could give multiplecommas-separated constrains:
    void choose(ref int*(a,b) result, int* a, int* b)
    {
        result = rand() > 0.5 ? a : b;
    }
This all is irrelevant. You essentially change the syntax. Syntax is,again, the least of the problems to be solved.


Ok then. Let's go to the real problems.

I suspect there are things you can't even express without symbolicregions. Consider this example from Dan's slides:
struct ILst(region R1, region R2) {
     int *R1 hd;
     ILst!(R1, R2) *R2 tl;
}
This code reflects the fact that the list holds pointer to integers inone region, whereas the nodes themselves are in a different region. Itwould be a serious challenge to tackle that without symbolic regions,and simpler that won't be for anybody.
Today's templates are just fine for that. Just propagate variablesthrough template arguments and apply region constrains to the members:
    struct ILst(alias var1, alias var2) {
        int*(var1) hd;
        ILst!(var1, var2)*(var2) tl;
    }
        int z;
    int*(z) a, b;
    ILst!(a, b) lst1;
    ILst!(&z, &z) lst2;
I hope you agree that this is just written symbols without muchmeaning. This is not half-baked. It's not even rare. The cow is stillmoving. I can't eat that! :o) I can't even start replying to it becausethere are so many actual and potential issues, I'd need to get to workon them first.

If you mean there aren't any explanation, then you're right thatexplanations were somewhat missing from my last post. Sorry. I guess Iwas too tired to notice the lack of instructions.

Basically you apply the same rules as for the function signatures inthe preceding function examples. For instance, "int*(var1)" means theht pointer points to an int that lives at least as long as the onepointed by var1 (var1 must be an "int*" pointer). This means that youcan assign the content of var1 to it, or anything else that will liveat least as long as var1. It also mean you can take its value and placeit in var1, or any pointer with a shorter life.

Then, we have "ILst!(var1, var2)*(var2)". It's the same rules as thefirst, except that we have a different type beyond the pointer whichmust be valid through var2's lifetime.


The last code snippet shows how to use that template.

   int z;
   int*(z) a, b;
   ILst!(a, b) lst1;
   ILst!(&z, &z) lst2;

Here, we're declaring "int*(z)", which is a pointer to an int whoselifetime is equal or longer than the address of z. (ok, there's anerror here, it should have been "int*(&z)"). And normally, you wouldn'texplicitly write that, "int*" would be enough: the compiler shoulddetermine the default constrains automatically.

Then when you instanciate ILst!(a, b), the template will take thelifetime of a and b (which is the lifetime of the address of z) andapply it to pointers inside the struct.

We could even allow regions to propagate through type arguments too:

    struct ILst2(T1, T2) {
        int*(T1) hd;
        ILst2!(T1, T2)*(T2) tl;
    }
    ILst2!(typeof(&z), typeof(b)) lst3;

Again, some explanations were missing... Basically,region/scoping/lifetime constrains are attached to pointers. Whichmeans that propagating a type ought to be enough to propagate thelifetime constrains too. "ILst2!(typeof(&z), typeof(b))" is exactly thesame as "ILst!(&z, b)". ILst takes its constrains from variables whileILst2 takes its constrains from types.

But the two previous examples are a little stretched to make theconcept more similar to Cyclone. With my proposal, you can do muchbetter than this.

I think in most cases where you want to propagate constrains, you'llwant to propagate a type too. If what you want is a linked list, it'dbe better expressed generically like this:


        struct ListRoot(T) {
                ListNode!(T)* first;
        }
   struct ListNode(T) {
       T hd;
       ILst2!(T)* tl;
   }

        int global;
        void foo() {
                int a;
                ListRoot!(int*) listRoot;
                ListNode!(int*) listNode;
                listRoot.first = &listNode;
                listNode.hd = &a;
                listNode.hd = &global;
        }

Notice how there is absolutely no special annotation here; it's alreadyvalid template code.

Now, let the compiler apply some defaults according to these rules:types declared in local variables will be allowed to point to values oftheir own region, and structs members will be allowed to point tovalues of the same region the struct comes from. Annotated explicitly,the default annotations would look like this:


        struct ListRoot(T) {
                ListNode!(T)*(this) first; // pointer to something in the same 
region as this
        }
   struct ListNode(T) {
       T value; // if T is a pointer, it holds its own region annotations

ILst2!(T)*(this) next; // pointer to something in the sameregion as this

   }

        int global;
        void foo() {
                int a;
                ListRoot!(int*(&listRoot)) listRoot;
                ListNode!(int*(&listNode)) listNode;
                listRoot.first = &listNode;
                listNode.value = &a;
                listNode.value = &global;
        }

With this scheme, the lifetime of all nodes in the linked list need tobe equal or longer than the one of the preceding node (normally, theywill all be equal), and the lifetime of the value pointer is determinedby the type you give as a template argument to ListRoot and ListNode.Therefore, it becomes possible to construct the linked list on thestack when the root is on the stack, with no need for explicitannotations.

There is still one problem though. If you want to swap two nodes, youcan't, because there is no guarenty that the lifetime of the "this"pointer of a ListNode is equal to lifetime of the "next" pointer. (Infact, the next pointer lifetime is longer or equal to the structlifetime). So if we're going to swap or reorder nodes, we'll need a wayto constrain the "this" pointer against the "next" pointer to create acircular reference and thus forcing the two pointers to point to thesame region... perhaps something like this:

        
   struct ListNode(T) {
                ListNode*(next) this;
       T value;
       ILst2!(T)*(this) next;
   }

Not a very good syntax though.

I think this example is a good case for attaching region constrainsdirectly to types instead of expressing them as conditional expressionselsewhere, as in "if (region a <= region b)".
I am thoroughly lost here, sorry. I can't even answer "this is sowrong" or "this is pure genius". Probably it's somewhere in between:o). At any rate, I suggest you develop a solid understanding ofCyclone if you want to build something related to it.


I'll side with "pure genius", but I also consider myself biased. :-)

I'm sorry about how you feel. Now we're in a conundrum of sorts. Youseem to strongly believe you can make some nice simplified regionswork, and make people like them. Taking that to a proof is hard. Theconundrum is, you are facing the prospect of putting work into it andcreating a system that, albeit correct, is not enticing.
Currently, I'm just trying to convince you (and any other potentialsilent listeners) that it can work.
I understand I've been blunt throughout this post, but please side withme for a minute. I'm doing so for the following reasons: (a) I'messentially writing this post in negative time; (b) I believe youcurrently don't have an attack on the problem you're trying to solve;(c) I believe it's worthwhile for you to develop an attack on theproblem, (d) I think "we" = "the D community" should seriously considersafety and consequently things like region analysis.


I don't mind about (a) and I agree about (d).

I'll say that because of my lack of expertise with Cyclone I have somedifficulty expressing my proposal as a comparaison of what is differentfrom Cyclone (it's difficult enough without it). You're the one askingfor such a comparison and increasing the difficulty. I do not dislikethe challenge, but I don't think you can take this as a proof that Idon't understand well the problem I'm trying to solve when I may justbe mixing some things about the approach taken by Cyclone.

Another thing not helping is that my original proposal has evolved alittle since the first time I started the "full scope analysisproposal" thread. I also revamped the syntax I use to talk about theproblem (and apparently I should do it again to avoid a conflicts withfunction names). Hunting in previous post the details I leave out inthe more recent ones doesn't help anyone understanding what I'm talkingabout.

I'm thinking that maybe I should put everything in one document to havea coherent proposal that could evolve as a whole instead of onescattered on various post between which the syntax I use and someconcepts have evolved.

You can now stop siding with me and side again with yourself. At thispoint you can easily guess that all of the above was to prepare you foran even blunter comment. Here goes.
You say you want to convince people "it can work". But right now thereis no "it". You have no "it". Much less an "it" that can work.
But there is of course good hope that an "it" could emerge, and Iencourage you to continue working towards that goal. It's just a lotmore work than it might appear.

I'm pretty sure I hold that "it" just now, or something very near it.It's just that it seems I haven't explained it well enough for you (andprobably anyone) to understand correctly. I should probably write itall down in one coherent and more formal document rather thanscattering all the details over many different posts as half-documentedconcept-name-changing written-too-fast examples.



--
Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/

Re: Escape analysis (full scope analysis proposal)

Reply via email to