Re: Escape analysis (full scope analysis proposal)

Michel Fortin Wed, 12 Nov 2008 05:50:24 -0800

On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu<[EMAIL PROTECTED]> said:

Michel Fortin wrote:
I'd like to point out that the two things people complained the mostabout regarding the automatic dynamic allocation for dynamic closures:
1.    There is no way to prevent it, to make sure there is no allocation.
2.    The compiler does allocate a lot more than necessary.

In my proposal, these two points are addressed:

1.    You can declare any variable as "scope", preventing it from being placed
    in a broader scope, preventing at the same time dynamic allocation.
2.    The compiler being aware of what arguments do and do not escape the
    scope of the called functions, it won't allocate unnecessarily.

So I think the situation would be much better.
I agree that an escape analyzer would improve things. I am not surethat one oblivious to regions is expressive enough.

If you think I proposed a region-oblivious scheme, then you've got mewrong (and perhaps it's my fault for not explaining well enough). Letme explain again, and I'll try to not skip anything this time.

Cyclone has dynamic regions, regions which are allocated on the heapbut that are deleted at the end of the scope that created them.Basically, those are scoped heaps offering a very useful system toautomatically free memory. (It's somewhat similar in concept to Cocoa'sNSAutoReleasePool for instance.) The downside of them is that you needto pass region handle around (so called functions can allocate objectswithin them).

So my first point is that since we have a garbage collector in D, andmoreover since we're likely to get one heap per thread in D2, we don'tneed dynamic regions. The remaining regions are: 1) the shared heap, 2)the thread-local heap, 3) All the stack frames; and you can't allocateother stack frames than the current one. Because none of these regionsrequire a handle to allocate into, we (A) don't need region handles.

We still have many regions. Beside the two heaps (shared,thread-local), each function's stack frame, and each block within them,creates a distinct memory region. But nowhere we need to know exactlywhich region a function parameter comes from; what we need to know iswhich address outlives which pointer, and then we can forbid assigningaddresses to pointers that outlive them. All we need is a relativeordering of the various regions, and for that we don't need to attach*names* to the regions so that you can refer explicitly to them in thesyntax. Instead, you could say something like "region of (x)", or"region of (*y)" and that would be enough.

So there is still a region for every pointer, only regions don't needto be *named* because you can always refer to them by referring to thevariables. (And perhaps the syntax would be clearer with region namesthan without, in which case I don't mind we use them. But they're notrequired for the concept to work.)

I'm not too thrilled by references. I once got a question from someonecoming from C: what is the difference between a pointer and a referencein C++? I had to answer: references are pointers with a differentsyntax, no rebindability, and no possibility of being null. It seems heand I both agree that references are mostly a cosmetic patch to solve asyntactic problem. References in D aren't much different.
I disagree. References in D are very different. They are not typeconstructors. They are storage classes that can only be used infunction signatures, which makes them impossible to dangle. I think C++references would also have been much better off as storage classesinstead of half-life types.


Which makes me think of this:

        struct A { int i; this(); }
        ref A foo(ref A a) { return a; }

        ref A bar()
        {
                foo(A()).i = 1;

                ref A a = foo(A()); // illegal, ref cannot be used outside 
function signature
                a.i = 1;

                return foo(A()); // illegal ?
        }

Also, I'd like to point out that ref (and out) being storage classessomewhat hinder me from using them where it makes sense in theD/Objective-C bridge, since there most functions are instanciated bytemplates where template arguments give the type of each functionargument. Perhaps there should be a way to specify "ref" and "out" intemplate arguments...

If we could have a unified syntax for pointers of all kinds, I thinkit'd be more convenient than having two kinds of pointers. Anull-forbiding but rebindable pointer would be more useful in myopinion than the current reference concept.
Well ref means "This function wants to modify its argument". That is avery different charter from what pointers mean. So I'm not sure how yousay you'd much prefer this to that. They are not comparable.

I was under the impression that ref would be allowed as a storage classfor local variables. I'll say it's perfectly acceptable for functionarguments, but I'm less sure about function return types.

Also, I'd still like to have a non-null pointer type, especially forclarifying function sigatures. A template can do. If it was in thelanguage however it be used by more people, which would be better.

But I'd be curious what others think of it. Notice how the discussionparticipants got reduced to you and me, and from what I saw that's nota good sign.
Indeed. I'm interested in other opinions too.
But I'm under the impression that many lost track of what was beingdiscussed, especially since we started referring to Cyclone which feware familiar with and probably few have read the paper.
In my experience, when someone is interested in something, she'd maketime for it. So I take that as lack of interest. And hey, since whenwas lack of expertise a real deterrent? :o)

As I said below, I think many people in this group are alreadyconfortable with using pointers, which may explain why they're not sointerested. Having no one interested in something doesn't necessarlymean they won't appreciate it when it comes.

It does, however reduce the incitative for continuing forward. So Iunderstand why you're backing off, even if it displease me somewhat.

One of the fears expressed at the start of the thread was aboutexcessive need for annotation, but as the Cyclone paper say, with gooddefaults, you need to add scoping annotation only to a few specificplaces. (It took me some time to read the paper and start discussingthings sanely after that, remember?) So perhaps we could get morepeople involved if we could propose a tangible syntax for it.
To be very frank, I think we are very far from having an actualproposal, and syntax is of very low priority now if you want to put onetogether. Right now what we have is a few vague ideas and conjectures(e.g., there's no need for named regions because the need would be rareenough to require dynamic allocation for those cases). I'm not sayingthat to criticize, but merely to underline the difficulties.

I never said the need for dynamic regions would be rare: I said garbagecollector obsoletes it. If we can justify the need for dynamic regionslater, we can add them back (with all the added complexity it requires)but I'd try without them first.

Or perhaps not; for advanced programmers who already understand wellwhat can and cannot be done by passing pointers around, full escapeanalysis may not seem to be a so interesting gain since they've alreadyadopted the right conventions to avoid most bugs it would prevent. Andmost people here who can discuss this topic with some confidence arenot newbies to programming and don't make too much mistakes of the sortanymore.
Which makes me think of beginners saying pointers are hard. You'vecertainly seen beginners struggle as they learn how to correctly usepointers in C or C++. Making sure their program fail at compile-time,with an explicative error message as to why they mustn't do this orthat, is certainly going to help their experience learning the languagemore than cryptic and frustrating segfaults and access violations atruntime, sometime far from the source of the problem.
I totally agree that pointers are hard and good static checking forthem would help. Currently, what we try to do is obviate the need forpointers in most cases, and to actually forbid them in safe modules.

But dynamic arrays *are* pointers, how are you oblivating the need forthem? If you find a solution for dynamic arrays, you'll have a solutionfor pointers too.

You could forbid dynamic arrays from refering to stack-allocated staticones, or automatically dynamically allocate those when they escape in adynamic array. And if I were you, whatever you choose for arrays I'dallow it for pointers too, to keep things consistent. Pointer to heapobjects should be retained in my opinion.

The question that remains is, how many unsafe modules are necessary,and what liability do they entail? If there are few and not toounwieldy, maybe we can declare victory without constructing an escapeanalyzer. I agree if you or anyone says they don't think so. At thispoint, I am not sure, but what I can say is that it's good to reducethe need for pointers regardless.

But are you reducing the need for pointers or hiding and restrictingthem? I'd say the later. Reference are pointers with restrictions.Object references are no different from pointer except in syntax (theycan even point to stack allocated objects with scope classes). Dynamicarrays are pointers with a certain range. Closure have a pointer to astack frame, which can be heap-allocated or not.

The only way to have a safe system without escape analysis is to forceeverything they can point to to be on the heap, or prevent them fromescaping at all (as with ref). I which there could be some consistencyhere.



--
Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/

Re: Escape analysis (full scope analysis proposal)

Reply via email to