Re: Escape analysis (full scope analysis proposal)

Andrei Alexandrescu Wed, 12 Nov 2008 21:55:22 -0800

Michel Fortin wrote:

On 2008-11-12 10:02:02 -0500, Andrei Alexandrescu<[EMAIL PROTECTED]> said:
Michel Fortin wrote:
On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu<[EMAIL PROTECTED]> said:So my first point is that since we have a garbage collector in D, andmoreover since we're likely to get one heap per thread in D2, wedon't need dynamic regions. The remaining regions are: 1) the sharedheap, 2) the thread-local heap, 3) All the stack frames; and youcan't allocate other stack frames than the current one. Because noneof these regions require a handle to allocate into, we (A) don't needregion handles.
We still have many regions. Beside the two heaps (shared,thread-local), each function's stack frame, and each block withinthem, creates a distinct memory region. But nowhere we need to knowexactly which region a function parameter comes from; what we need toknow is which address outlives which pointer, and then we can forbidassigning addresses to pointers that outlive them. All we need is arelative ordering of the various regions, and for that we don't needto attach *names* to the regions so that you can refer explicitly tothem in the syntax. Instead, you could say something like "region of(x)", or "region of (*y)" and that would be enough.
But how do you type then the assignment example?

void assign(int** p, int * r) { *p = *r; }

How do you reflect the requirement that r's region outlives *p's region?

But that's not even the point. Say you define some notation, such as:

void assign(int** p, int * r) if (region(r) <= region(p));
But the whole point of regions was to _simplify_ notations like theabove into:
void assign(region R)(int*R* p, int *R r);
So although you think you simplified things by using region(symbol)instead of symbolic names, you complicated things. The compiler stillneeds to infer regions for each value, so it is as complicated as anamed-regions compiler, and in addition you require the user to writebulkier expressions because you disallow use of symbols. So everybodyis worse off. Note how in the example using a symbolic region theoutlives relationship is enforced implicitly by using the same symbolname in two places.
Everywhere I said there was no need for named regions, I also said namedregions could be kept to ease the syntax. That said, I'm not so surenamed regions are that good at simplifying the syntax. In your assignexample above, the named-region version has an error: it forces the twopointers to be of the same region. That could be fine, but, assumingyou're assigning to *p, it'd be more precise to write it like that:
    void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);

No, the code is correct as written (without the if). You may want toreread the paper with an eye for region subtyping rules. This partlybacks up my point: understanding region analysis may be quite a burdenfor the average programmer. Even you, who took pains to think througheverything and absorb the paper, are having trouble. And me too to behonest :o).

Once we get there, I think the no-named region syntax is better.


This is invalidated by the wrong assertion above.

Thatsaid, for the swap example, where both values need to share the sameregion, the named region notation is simpler:
    void swap(region R)(int*R a, int*R b);
    void swap(int* a, int* b) if (region(a) == region(b));

No, for that swap there is no need to specify any region. You can swapints in any two regions. Probably you meant to use int** throughout.

But I'd argue that most of the time regions do not need to be equal, butare subset or superset of each other, so reusing variable names makesmore sense in my opinion.

Don't forget that using a region name twice may actually work with twodifferent regions, so far as they are in a subtyping relationship.Region subtyping is key to both simplifying code and to understandingcode after simplification.

In any case, I prefer a notation where regions constrains are attacheddirectly to the type instead of being expressed somewhere else.Something like this (explained below):
    void assign(int*(r)* p, int* r) { *p = r; }
    void swap(ref int*(b) a, ref int*(a) b);

Sure. I'm sure there's understanding that that doesn't make anything anysimpler or any easier to implement or understand. It's just a minorchange in notation, and IMHO not to the better.

Here, a parenthesis suffix after a pointer indicates the regionconstrain of the pointer, based on the region of another pointer.


I thought it means pointer to function. Oops.

In thefirst example, int*(r)* means that the integer pointer "*p" must notlive beyond the value pointed by "r" (because we're going to assign "r"to "*p"). In the second example, the value pointed by "a" must not livelonger than the one pointed by "b" and the value pointed by "b" must notlive longer than the one pointed "a"; the net result is that they musthave the same lifetime and need to be in the same region.
For something more complicated, you could give multiple commas-separatedconstrains:
    void choose(ref int*(a,b) result, int* a, int* b)
    {
        result = rand() > 0.5 ? a : b;
    }

This all is irrelevant. You essentially change the syntax. Syntax is,again, the least of the problems to be solved.

I suspect there are things you can't even express without symbolicregions. Consider this example from Dan's slides:
struct ILst(region R1, region R2) {
     int *R1 hd;
     ILst!(R1, R2) *R2 tl;
}
This code reflects the fact that the list holds pointer to integers inone region, whereas the nodes themselves are in a different region. Itwould be a serious challenge to tackle that without symbolic regions,and simpler that won't be for anybody.
Today's templates are just fine for that. Just propagate variablesthrough template arguments and apply region constrains to the members:
    struct ILst(alias var1, alias var2) {
        int*(var1) hd;
        ILst!(var1, var2)*(var2) tl;
    }
int z;
    int*(z) a, b;
    ILst!(a, b) lst1;
    ILst!(&z, &z) lst2;

I hope you agree that this is just written symbols without much meaning.This is not half-baked. It's not even rare. The cow is still moving. Ican't eat that! :o) I can't even start replying to it because there areso many actual and potential issues, I'd need to get to work on them first.

We could even allow regions to propagate through type arguments too:

    struct ILst2(T1, T2) {
        int*(T1) hd;
        ILst2!(T1, T2)*(T2) tl;
    }
    ILst2!(typeof(&z), typeof(b)) lst3;
I think this example is a good case for attaching region constrainsdirectly to types instead of expressing them as conditional expressionselsewhere, as in "if (region a <= region b)".

I am thoroughly lost here, sorry. I can't even answer "this is so wrong"or "this is pure genius". Probably it's somewhere in between :o). At anyrate, I suggest you develop a solid understanding of Cyclone if you wantto build something related to it.

[In the interest of coherence I snipped away unrelated parts of thediscussion.]

I'm sorry about how you feel. Now we're in a conundrum of sorts. Youseem to strongly believe you can make some nice simplified regionswork, and make people like them. Taking that to a proof is hard. Theconundrum is, you are facing the prospect of putting work into it andcreating a system that, albeit correct, is not enticing.
Currently, I'm just trying to convince you (and any other potentialsilent listeners) that it can work.

I understand I've been blunt throughout this post, but please side withme for a minute. I'm doing so for the following reasons: (a) I'messentially writing this post in negative time; (b) I believe youcurrently don't have an attack on the problem you're trying to solve;(c) I believe it's worthwhile for you to develop an attack on theproblem, (d) I think "we" = "the D community" should seriously considersafety and consequently things like region analysis.

You can now stop siding with me and side again with yourself. At thispoint you can easily guess that all of the above was to prepare you foran even blunter comment. Here goes.

You say you want to convince people "it can work". But right now thereis no "it". You have no "it". Much less an "it" that can work.

But there is of course good hope that an "it" could emerge, and Iencourage you to continue working towards that goal. It's just a lotmore work than it might appear.




Andrei

Re: Escape analysis (full scope analysis proposal)

Reply via email to