On 2008-11-13 00:53:50 -0500, Andrei Alexandrescu <[EMAIL PROTECTED]> said:

Michel Fortin wrote:
Everywhere I said there was no need for named regions, I also said named regions could be kept to ease the syntax. That said, I'm not so sure named regions are that good at simplifying the syntax. In your assign example above, the named-region version has an error: it forces the two pointers to be of the same region. That could be fine, but, assuming you're assigning to *p, it'd be more precise to write it like that:

    void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);

No, the code is correct as written (without the if). You may want to reread the paper with an eye for region subtyping rules. This partly backs up my point: understanding region analysis may be quite a burden for the average programmer. Even you, who took pains to think through everything and absorb the paper, are having trouble. And me too to be honest :o).

Ok, I've reread that part and it's true that using Cyclone's subtyping rules it'd work fine with only one region name because Cyclone implicitly creates two regions from that, the first being a subset of the other, just as I wrote explicitly here. But what I missed out was one of Cyclone's syntactic construct, not a concept of regions.

... or perhaps we have a different notion of what is a syntax and what is a concept?


Once we get there, I think the no-named region syntax is better.

This is invalidated by the wrong assertion above.

Yes and no. It's true that Cyclone's region subtyping makes the syntax prettier. On the other side, the programmer has to be aware of how it works, and especially aware that changing the order his arguments will implicitly change the region relationship between them.


That said, for the swap example, where both values need to share the same region, the named region notation is simpler:

    void swap(region R)(int*R a, int*R b);
    void swap(int* a, int* b) if (region(a) == region(b));

No, for that swap there is no need to specify any region. You can swap ints in any two regions. Probably you meant to use int** throughout.

Hum, you're right, I meant to make these "ref int*".


But I'd argue that most of the time regions do not need to be equal, but are subset or superset of each other, so reusing variable names makes more sense in my opinion.

Don't forget that using a region name twice may actually work with two different regions, so far as they are in a subtyping relationship. Region subtyping is key to both simplifying code and to understanding code after simplification.

I'm not convinced that region subtyping is so simple to understand for neophytes, especially because you may assume the same region at first glance. Cyclone isn't C++, but this region subtyping rule makes me think of one of those many little known corners in C++ such as Koenig name lookup.

But I consider this just a syntactic issue about how to express regions though. And I may be completely wrong about its unintuitiveness.


In any case, I prefer a notation where regions constrains are attached directly to the type instead of being expressed somewhere else. Something like this (explained below):

    void assign(int*(r)* p, int* r) { *p = r; }
    void swap(ref int*(b) a, ref int*(a) b);

Sure. I'm sure there's understanding that that doesn't make anything any simpler or any easier to implement or understand. It's just a minor change in notation, and IMHO not to the better.

Ok, then we disagree here. I think this notation is better because it makes you think about things in term of pointer lifetime vs. the pointed data lifetime, which I think is much less abstract than variables being part of different regions where some regions encompass other regions. It's a shift in perspective from the syntactic approach of Cyclone, although under the hood the compiler would do mostly the same work.


Here, a parenthesis suffix after a pointer indicates the region constrain of the pointer, based on the region of another pointer.

I thought it means pointer to function. Oops.

And I though the syntax was the least of your concern right now? :-) This probably can't be the final syntax, but I think it makes things clear enough talk about about the concepts... for now.


In the first example, int*(r)* means that the integer pointer "*p" must not live beyond the value pointed by "r" (because we're going to assign "r" to "*p"). In the second example, the value pointed by "a" must not live longer than the one pointed by "b" and the value pointed by "b" must not live longer than the one pointed "a"; the net result is that they must have the same lifetime and need to be in the same region.

For something more complicated, you could give multiple commas-separated constrains:

    void choose(ref int*(a,b) result, int* a, int* b)
    {
        result = rand() > 0.5 ? a : b;
    }

This all is irrelevant. You essentially change the syntax. Syntax is, again, the least of the problems to be solved.

Ok then. Let's go to the real problems.


I suspect there are things you can't even express without symbolic regions. Consider this example from Dan's slides:

struct ILst(region R1, region R2) {
     int *R1 hd;
     ILst!(R1, R2) *R2 tl;
}

This code reflects the fact that the list holds pointer to integers in one region, whereas the nodes themselves are in a different region. It would be a serious challenge to tackle that without symbolic regions, and simpler that won't be for anybody.

Today's templates are just fine for that. Just propagate variables through template arguments and apply region constrains to the members:

    struct ILst(alias var1, alias var2) {
        int*(var1) hd;
        ILst!(var1, var2)*(var2) tl;
    }
        int z;
    int*(z) a, b;
    ILst!(a, b) lst1;
    ILst!(&z, &z) lst2;

I hope you agree that this is just written symbols without much meaning. This is not half-baked. It's not even rare. The cow is still moving. I can't eat that! :o) I can't even start replying to it because there are so many actual and potential issues, I'd need to get to work on them first.

If you mean there aren't any explanation, then you're right that explanations were somewhat missing from my last post. Sorry. I guess I was too tired to notice the lack of instructions.

Basically you apply the same rules as for the function signatures in the preceding function examples. For instance, "int*(var1)" means the ht pointer points to an int that lives at least as long as the one pointed by var1 (var1 must be an "int*" pointer). This means that you can assign the content of var1 to it, or anything else that will live at least as long as var1. It also mean you can take its value and place it in var1, or any pointer with a shorter life.

Then, we have "ILst!(var1, var2)*(var2)". It's the same rules as the first, except that we have a different type beyond the pointer which must be valid through var2's lifetime.

The last code snippet shows how to use that template.

   int z;
   int*(z) a, b;
   ILst!(a, b) lst1;
   ILst!(&z, &z) lst2;

Here, we're declaring "int*(z)", which is a pointer to an int whose lifetime is equal or longer than the address of z. (ok, there's an error here, it should have been "int*(&z)"). And normally, you wouldn't explicitly write that, "int*" would be enough: the compiler should determine the default constrains automatically.

Then when you instanciate ILst!(a, b), the template will take the lifetime of a and b (which is the lifetime of the address of z) and apply it to pointers inside the struct.


We could even allow regions to propagate through type arguments too:

    struct ILst2(T1, T2) {
        int*(T1) hd;
        ILst2!(T1, T2)*(T2) tl;
    }
    ILst2!(typeof(&z), typeof(b)) lst3;

Again, some explanations were missing... Basically, region/scoping/lifetime constrains are attached to pointers. Which means that propagating a type ought to be enough to propagate the lifetime constrains too. "ILst2!(typeof(&z), typeof(b))" is exactly the same as "ILst!(&z, b)". ILst takes its constrains from variables while ILst2 takes its constrains from types.

But the two previous examples are a little stretched to make the concept more similar to Cyclone. With my proposal, you can do much better than this.

I think in most cases where you want to propagate constrains, you'll want to propagate a type too. If what you want is a linked list, it'd be better expressed generically like this:

        struct ListRoot(T) {
                ListNode!(T)* first;
        }
   struct ListNode(T) {
       T hd;
       ILst2!(T)* tl;
   }

        int global;
        void foo() {
                int a;
                ListRoot!(int*) listRoot;
                ListNode!(int*) listNode;
                listRoot.first = &listNode;
                listNode.hd = &a;
                listNode.hd = &global;
        }

Notice how there is absolutely no special annotation here; it's already valid template code.

Now, let the compiler apply some defaults according to these rules: types declared in local variables will be allowed to point to values of their own region, and structs members will be allowed to point to values of the same region the struct comes from. Annotated explicitly, the default annotations would look like this:

        struct ListRoot(T) {
                ListNode!(T)*(this) first; // pointer to something in the same 
region as this
        }
   struct ListNode(T) {
       T value; // if T is a pointer, it holds its own region annotations
ILst2!(T)*(this) next; // pointer to something in the same region as this
   }

        int global;
        void foo() {
                int a;
                ListRoot!(int*(&listRoot)) listRoot;
                ListNode!(int*(&listNode)) listNode;
                listRoot.first = &listNode;
                listNode.value = &a;
                listNode.value = &global;
        }

With this scheme, the lifetime of all nodes in the linked list need to be equal or longer than the one of the preceding node (normally, they will all be equal), and the lifetime of the value pointer is determined by the type you give as a template argument to ListRoot and ListNode. Therefore, it becomes possible to construct the linked list on the stack when the root is on the stack, with no need for explicit annotations.

There is still one problem though. If you want to swap two nodes, you can't, because there is no guarenty that the lifetime of the "this" pointer of a ListNode is equal to lifetime of the "next" pointer. (In fact, the next pointer lifetime is longer or equal to the struct lifetime). So if we're going to swap or reorder nodes, we'll need a way to constrain the "this" pointer against the "next" pointer to create a circular reference and thus forcing the two pointers to point to the same region... perhaps something like this:
        
   struct ListNode(T) {
                ListNode*(next) this;
       T value;
       ILst2!(T)*(this) next;
   }

Not a very good syntax though.


I think this example is a good case for attaching region constrains directly to types instead of expressing them as conditional expressions elsewhere, as in "if (region a <= region b)".

I am thoroughly lost here, sorry. I can't even answer "this is so wrong" or "this is pure genius". Probably it's somewhere in between :o). At any rate, I suggest you develop a solid understanding of Cyclone if you want to build something related to it.

I'll side with "pure genius", but I also consider myself biased. :-)


I'm sorry about how you feel. Now we're in a conundrum of sorts. You seem to strongly believe you can make some nice simplified regions work, and make people like them. Taking that to a proof is hard. The conundrum is, you are facing the prospect of putting work into it and creating a system that, albeit correct, is not enticing.

Currently, I'm just trying to convince you (and any other potential silent listeners) that it can work.

I understand I've been blunt throughout this post, but please side with me for a minute. I'm doing so for the following reasons: (a) I'm essentially writing this post in negative time; (b) I believe you currently don't have an attack on the problem you're trying to solve; (c) I believe it's worthwhile for you to develop an attack on the problem, (d) I think "we" = "the D community" should seriously consider safety and consequently things like region analysis.

I don't mind about (a) and I agree about (d).

I'll say that because of my lack of expertise with Cyclone I have some difficulty expressing my proposal as a comparaison of what is different from Cyclone (it's difficult enough without it). You're the one asking for such a comparison and increasing the difficulty. I do not dislike the challenge, but I don't think you can take this as a proof that I don't understand well the problem I'm trying to solve when I may just be mixing some things about the approach taken by Cyclone.

Another thing not helping is that my original proposal has evolved a little since the first time I started the "full scope analysis proposal" thread. I also revamped the syntax I use to talk about the problem (and apparently I should do it again to avoid a conflicts with function names). Hunting in previous post the details I leave out in the more recent ones doesn't help anyone understanding what I'm talking about.

I'm thinking that maybe I should put everything in one document to have a coherent proposal that could evolve as a whole instead of one scattered on various post between which the syntax I use and some concepts have evolved.


You can now stop siding with me and side again with yourself. At this point you can easily guess that all of the above was to prepare you for an even blunter comment. Here goes.

You say you want to convince people "it can work". But right now there is no "it". You have no "it". Much less an "it" that can work.

But there is of course good hope that an "it" could emerge, and I encourage you to continue working towards that goal. It's just a lot more work than it might appear.

I'm pretty sure I hold that "it" just now, or something very near it. It's just that it seems I haven't explained it well enough for you (and probably anyone) to understand correctly. I should probably write it all down in one coherent and more formal document rather than scattering all the details over many different posts as half-documented concept-name-changing written-too-fast examples.


--
Michel Fortin
[EMAIL PROTECTED]
http://michelf.com/

Reply via email to