Michel Fortin wrote:
On 2008-11-12 10:02:02 -0500, Andrei Alexandrescu
<[EMAIL PROTECTED]> said:
Michel Fortin wrote:
On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu
<[EMAIL PROTECTED]> said:
So my first point is that since we have a garbage collector in D, and
moreover since we're likely to get one heap per thread in D2, we
don't need dynamic regions. The remaining regions are: 1) the shared
heap, 2) the thread-local heap, 3) All the stack frames; and you
can't allocate other stack frames than the current one. Because none
of these regions require a handle to allocate into, we (A) don't need
region handles.
We still have many regions. Beside the two heaps (shared,
thread-local), each function's stack frame, and each block within
them, creates a distinct memory region. But nowhere we need to know
exactly which region a function parameter comes from; what we need to
know is which address outlives which pointer, and then we can forbid
assigning addresses to pointers that outlive them. All we need is a
relative ordering of the various regions, and for that we don't need
to attach *names* to the regions so that you can refer explicitly to
them in the syntax. Instead, you could say something like "region of
(x)", or "region of (*y)" and that would be enough.
But how do you type then the assignment example?
void assign(int** p, int * r) { *p = *r; }
How do you reflect the requirement that r's region outlives *p's region?
But that's not even the point. Say you define some notation, such as:
void assign(int** p, int * r) if (region(r) <= region(p));
But the whole point of regions was to _simplify_ notations like the
above into:
void assign(region R)(int*R* p, int *R r);
So although you think you simplified things by using region(symbol)
instead of symbolic names, you complicated things. The compiler still
needs to infer regions for each value, so it is as complicated as a
named-regions compiler, and in addition you require the user to write
bulkier expressions because you disallow use of symbols. So everybody
is worse off. Note how in the example using a symbolic region the
outlives relationship is enforced implicitly by using the same symbol
name in two places.
Everywhere I said there was no need for named regions, I also said named
regions could be kept to ease the syntax. That said, I'm not so sure
named regions are that good at simplifying the syntax. In your assign
example above, the named-region version has an error: it forces the two
pointers to be of the same region. That could be fine, but, assuming
you're assigning to *p, it'd be more precise to write it like that:
void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);
No, the code is correct as written (without the if). You may want to
reread the paper with an eye for region subtyping rules. This partly
backs up my point: understanding region analysis may be quite a burden
for the average programmer. Even you, who took pains to think through
everything and absorb the paper, are having trouble. And me too to be
honest :o).
Once we get there, I think the no-named region syntax is better.
This is invalidated by the wrong assertion above.
That
said, for the swap example, where both values need to share the same
region, the named region notation is simpler:
void swap(region R)(int*R a, int*R b);
void swap(int* a, int* b) if (region(a) == region(b));
No, for that swap there is no need to specify any region. You can swap
ints in any two regions. Probably you meant to use int** throughout.
But I'd argue that most of the time regions do not need to be equal, but
are subset or superset of each other, so reusing variable names makes
more sense in my opinion.
Don't forget that using a region name twice may actually work with two
different regions, so far as they are in a subtyping relationship.
Region subtyping is key to both simplifying code and to understanding
code after simplification.
In any case, I prefer a notation where regions constrains are attached
directly to the type instead of being expressed somewhere else.
Something like this (explained below):
void assign(int*(r)* p, int* r) { *p = r; }
void swap(ref int*(b) a, ref int*(a) b);
Sure. I'm sure there's understanding that that doesn't make anything any
simpler or any easier to implement or understand. It's just a minor
change in notation, and IMHO not to the better.
Here, a parenthesis suffix after a pointer indicates the region
constrain of the pointer, based on the region of another pointer.
I thought it means pointer to function. Oops.
In the
first example, int*(r)* means that the integer pointer "*p" must not
live beyond the value pointed by "r" (because we're going to assign "r"
to "*p"). In the second example, the value pointed by "a" must not live
longer than the one pointed by "b" and the value pointed by "b" must not
live longer than the one pointed "a"; the net result is that they must
have the same lifetime and need to be in the same region.
For something more complicated, you could give multiple commas-separated
constrains:
void choose(ref int*(a,b) result, int* a, int* b)
{
result = rand() > 0.5 ? a : b;
}
This all is irrelevant. You essentially change the syntax. Syntax is,
again, the least of the problems to be solved.
I suspect there are things you can't even express without symbolic
regions. Consider this example from Dan's slides:
struct ILst(region R1, region R2) {
int *R1 hd;
ILst!(R1, R2) *R2 tl;
}
This code reflects the fact that the list holds pointer to integers in
one region, whereas the nodes themselves are in a different region. It
would be a serious challenge to tackle that without symbolic regions,
and simpler that won't be for anybody.
Today's templates are just fine for that. Just propagate variables
through template arguments and apply region constrains to the members:
struct ILst(alias var1, alias var2) {
int*(var1) hd;
ILst!(var1, var2)*(var2) tl;
}
int z;
int*(z) a, b;
ILst!(a, b) lst1;
ILst!(&z, &z) lst2;
I hope you agree that this is just written symbols without much meaning.
This is not half-baked. It's not even rare. The cow is still moving. I
can't eat that! :o) I can't even start replying to it because there are
so many actual and potential issues, I'd need to get to work on them first.
We could even allow regions to propagate through type arguments too:
struct ILst2(T1, T2) {
int*(T1) hd;
ILst2!(T1, T2)*(T2) tl;
}
ILst2!(typeof(&z), typeof(b)) lst3;
I think this example is a good case for attaching region constrains
directly to types instead of expressing them as conditional expressions
elsewhere, as in "if (region a <= region b)".
I am thoroughly lost here, sorry. I can't even answer "this is so wrong"
or "this is pure genius". Probably it's somewhere in between :o). At any
rate, I suggest you develop a solid understanding of Cyclone if you want
to build something related to it.
[In the interest of coherence I snipped away unrelated parts of the
discussion.]
I'm sorry about how you feel. Now we're in a conundrum of sorts. You
seem to strongly believe you can make some nice simplified regions
work, and make people like them. Taking that to a proof is hard. The
conundrum is, you are facing the prospect of putting work into it and
creating a system that, albeit correct, is not enticing.
Currently, I'm just trying to convince you (and any other potential
silent listeners) that it can work.
I understand I've been blunt throughout this post, but please side with
me for a minute. I'm doing so for the following reasons: (a) I'm
essentially writing this post in negative time; (b) I believe you
currently don't have an attack on the problem you're trying to solve;
(c) I believe it's worthwhile for you to develop an attack on the
problem, (d) I think "we" = "the D community" should seriously consider
safety and consequently things like region analysis.
You can now stop siding with me and side again with yourself. At this
point you can easily guess that all of the above was to prepare you for
an even blunter comment. Here goes.
You say you want to convince people "it can work". But right now there
is no "it". You have no "it". Much less an "it" that can work.
But there is of course good hope that an "it" could emerge, and I
encourage you to continue working towards that goal. It's just a lot
more work than it might appear.
Andrei