Re: Closures, compile time, pad protos
On Thu, Nov 23, 2006 at 05:09:17PM -0500, Buddha Buck wrote: The way I see it, everything which defines a separate lexical scope (a block, a function, a closure. I forget if in my $a; ... ; my $b $b is visible in the ellipsis. If not, then a my statement also defines a separate lexical scope) effectively creates a separate pad, at run-time, when it is entered. my $b should not be visible in the ellipsis (it is in pugs, and that's a bug). However, there's no need for each my to define a separate scope, because the invisibility of $b in the block prior to its declaration is only required for compile time processing. If you hide a reference to $b in an eval string, the spec explicitly allows the runtime to use the lexical $b defined later in the block.
Re: Closures, compile time, pad protos
On 11/22/06, Anatoly Vorobey [EMAIL PROTECTED] wrote: First of all, thanks a lot for your comments. On Wed, Nov 22, 2006 at 06:43:12PM -0500, Buddha Buck wrote: { my $x = something(); if $x==1 { ...code... } } My experience with other statically typed by extremely flexable languages is that the pads tend to be arranged in (possibly interconnected) linked lists. In this example, I see potentially three pads linked by the time ...code... is called: One containing the local variables defined in ...code..., one containing the visibly defined $x, and one visible outside that scope. A reference to $x in ...code... will traverse the linked list until it finds an $x, presumably finding the one defined in the sample code. Agreed. By the way, can you offer a perspective on how the pads get linked up, at runtime? I see each block as having a compile-time pad, or proto-pad, filled with values known at compile-time; and every time the block is entered, a new pad is cloned from the proto-pad. At that point its OUTER reference leads to the proto-pad of the outer block, and we want to link it up to the real pad of the outer block. The way I see it, everything which defines a separate lexical scope (a block, a function, a closure. I forget if in my $a; ... ; my $b $b is visible in the ellipsis. If not, then a my statement also defines a separate lexical scope) effectively creates a separate pad, at run-time, when it is entered. The pad contains all the variables defined in that lexical scope, and a link to the pad for the surrounding lexical scope. The search for a variable is done by looking up the variable in the current pad, and if not found, recursively searching all linked pads until it is found or you run out of pads. There are reasonable optimizations that can be made. If a lexical scope doesn't create any variables, it can reuse the same pad as its enclosing lexical scope. If a lexical scope uses only part of an enclosing pad, the enclosing pad could be broken into two pieces, linked together, such that only part of it has to be searched or survives with the enclosed scope, etc. I haven't read any implementation details as to how Perl6 handles it, so I'm going to use the following notation: if $p is a pad, then $p.lookup('$var') returns the value of the variable $var in p, $p.myvars is a hash containing the local variables defined in $p, and $p.enclosing is the pad of the lexically enclosing scope. I think in p6 notation, that would be... class Pad { has %!myvars; has Pad $.outer; method lookup(String $var) { return %!myvars{$var} if exists %!myvars{$var}; return $.outer.lookup($var); } method set(String $var, $val) { %!myvars{$var} = $val if exists %!myvars{$val}; return $.outer.lookup($var, $val); } ... } One way to do it is to simply say: when we enter the inner block from the outer block, at that point we can re-link the inner block from the outer proto-pad to the outer pad we entered from. That by itself works, but I'm having trouble understanding what happens during a sub call rather than entering the block normally. For example: { my $x = 1; sub foo { $x; } bar(); } sub bar() { foo(); } Here we definitely want foo() to see $x==1 (I think), but we get to foo() via criss-crossing through bar(), and so how would foo() know where to find the right pad as its outer reference? I did some experiments with pugs based on explicitly separating what is visible at compile time from what is visible at run time. Specifically, I used the following code: my $x = 25; sub bar { my $x = 1; sub foo { print ++$x; } print $x; } print $x; // 25 foo(); // 1 foo(); // 2 bar(); // 1 foo (); // 2 foo (); // 3 bar(); // 1 foo(); // 4 Let me call the protopads of foo and bar foo0 and bar0, respectively. From what I see, foo is visible before bar is run (which was sort of unexpected to me, but reasonable). Let's see what happens... The statement sub bar{...} appears to set up a protopad $bar0 which contains an $x, but doesn't put in any values until bar is run. Everything is undef. The statement sub foo{...} also sets up a proto-pad $foo0 which is empty. It is linked, however, to the protopad for bar. ($foo0.outer = $bar0) Running foo(); before the bar() instantiates a pad $foo1 (= copy($foo0) for this invocation of foo, a copy of its proto-pad. Since this links to bar0, when ++$x is done, it modifies the $x in bar0 to 1. At the end of the call, $foo1 is garbage, waiting on collection. The next call of foo(); does something similar... $foo2 = copy($foo0), $x in bar0 gets accessed and incremented to 2, and $foo2 goes poof. Running bar(); instantiates a pad $bar1=copy($bar0) for this invocation of bar. In theory, the $x in this instantiation is 2, but the my statement sets it to 1. More importantly, finally the sub foo{...} is encountered at run-time, and there is a current lexical scope available for it. Since the code is already
Closures, compile time, pad protos
Hi, Anatoly and I don't know what this bit of code prints: foo(); foo(); for 1..3 { my $x ::= 3; sub foo { say ++$x }; say ++$x }; Is it 4, 5, 6, 6, 6 or 4, 5, 3, 3, 3? It's almost definitely not 4, 5, 6, 7, 8. I can't rationalize 4, 5, 6, 7, 8 while maintaining the notion that $x is actually lexical. To rationalize the other examples: 4, 5, 6, 6, 6 means that the foo declaration does not capture over an instance of the $x bar, but the actual value in the pad proto itself (the value that will be the default value of newly allocated $x variabless). 4, 5, 3, 3, 3 means that at compile time all variables are instantiated once for BEGIN time captures. Observe: foo(); bar(); for 1..3 { my $x; sub foo { say ++$x } sub bar { say ++$x } say ++$x; } prints 1, 2, 1, 1, 1 because $x is allocated once at compile time and captured into both foo and bar, and then separately allocated once more for each iteration of the loop. If this is indeed the case, then there is a semantics problem: foo(); foo(); for 1..3 { my $x; BEGIN { $x = 3 }; sub foo { say ++$x }; say ++$x }; Must be 4, 5, 1, 1, 1. This is because BEGIN { } and the foo share the same compile time allocated copy of $x, but this is not the copy in the loop. A related issue is: foo(); foo(); for 1..3 { my $x = 10; sub foo { say ++$x }; say ++$x; } Is that 11, 12, 10, 10, 10, or 11, 12, 13, 13, 13, or 1, 2, 10, 10, 10? Lastly, sub foo { my $x; sub { sub { say ++$x } } }; my $bar = foo(); my $gorch = $bar.(); $gorch.(); $gorch.(); my $quxx = $bar.(); $quxx.(); $quxx.(); obviously results in the sequence 0, 1, but does the second call to $bar create a new sequence in $quxx, or is that instance of $x shared between $gorch and $quxx? Intuitively i'd say it is shared, which means that the outer sub declaration implicitly captures $x as well. Can anyone confirm? Obviously my $zot = foo().(); $zot.(); $zot.(); Does create a new sequence. -- Yuval Kogman [EMAIL PROTECTED] http://nothingmuch.woobling.org 0xEBD27418 pgplPyMaquTyx.pgp Description: PGP signature
Re: Closures, compile time, pad protos
And what about: foo(); for 1..3 { my $x ::= 3; sub foo { say ++$x }; say ++$x }; BEGIN { foo(); foo(); } or worse: sub moose { my $x = 3; sub foo { say ++$x; } BEGIN { foo(); moose(); foo(); } foo(); moose(); foo(); *foam oozes out of ears* -- Yuval Kogman [EMAIL PROTECTED] http://nothingmuch.woobling.org 0xEBD27418 pgpQTxvdv3gAY.pgp Description: PGP signature
Re: Closures, compile time, pad protos
Yuval Kogman skribis 2006-11-22 16:01 (+0200): my $x ::= 3; sub foo { say ++$x }; Why would you be allowed to ++ this $x? It's bound to an rvalue! -- korajn salutojn, juerd waalboer: perl hacker [EMAIL PROTECTED] http://juerd.nl/sig convolution: ict solutions and consultancy [EMAIL PROTECTED] Ik vertrouw stemcomputers niet. Zie http://www.wijvertrouwenstemcomputersniet.nl/.
Re: Closures, compile time, pad protos
On Wed, Nov 22, 2006 at 18:55:15 +0100, Juerd wrote: Yuval Kogman skribis 2006-11-22 16:01 (+0200): my $x ::= 3; sub foo { say ++$x }; Why would you be allowed to ++ this $x? It's bound to an rvalue! Perhaps my $x ::= BEGIN { Scalar.new( :value(3) ) } What we meant to be doing was to pre-set this value at compile time to 3. That doesn't really matter though -- Yuval Kogman [EMAIL PROTECTED] http://nothingmuch.woobling.org 0xEBD27418 pgpz0EsZwn4z2.pgp Description: PGP signature
Re: Closures, compile time, pad protos
First of all, thanks a lot for your comments. On Wed, Nov 22, 2006 at 06:43:12PM -0500, Buddha Buck wrote: { my $x = something(); if $x==1 { ...code... } } My experience with other statically typed by extremely flexable languages is that the pads tend to be arranged in (possibly interconnected) linked lists. In this example, I see potentially three pads linked by the time ...code... is called: One containing the local variables defined in ...code..., one containing the visibly defined $x, and one visible outside that scope. A reference to $x in ...code... will traverse the linked list until it finds an $x, presumably finding the one defined in the sample code. Agreed. By the way, can you offer a perspective on how the pads get linked up, at runtime? I see each block as having a compile-time pad, or proto-pad, filled with values known at compile-time; and every time the block is entered, a new pad is cloned from the proto-pad. At that point its OUTER reference leads to the proto-pad of the outer block, and we want to link it up to the real pad of the outer block. One way to do it is to simply say: when we enter the inner block from the outer block, at that point we can re-link the inner block from the outer proto-pad to the outer pad we entered from. That by itself works, but I'm having trouble understanding what happens during a sub call rather than entering the block normally. For example: { my $x = 1; sub foo { $x; } bar(); } sub bar() { foo(); } Here we definitely want foo() to see $x==1 (I think), but we get to foo() via criss-crossing through bar(), and so how would foo() know where to find the right pad as its outer reference? Which leads to the natural idea of maintaining a runtime global stack of dynamically entered scopes, both scopes entered via sub calls and entered via just going into an inner block. Then, any time we enter a block, we can search back through the stack and find the most recent pad on it that is _a_ pad of our outer lexical block, and call that our OUTER. Is that how this is usually done? This way takes care of the criss-crossing example above, but I still don't quite understand what to do about calls deeply up and down the lexical hierarchy; consider a contrived example like { my $x = 1; { { { sub bar() {$x;} } } } sub foo() { { { { { { sub baz { $x; } } } } } } bar(); baz(); } } Here baz() is a few levels below foo(), lexical-wise, while bar() is on a different branch (in all cases the intermediate levels can be made nontrivial). But what they all have in common with foo() is that the block that has $x in its pad is an ancestor to all of them. So I think we'd want the calls to bar() and baz() to see the value of $x visible to foo(), but I'm not quite sure how they would find it. Neither of them seems to have any real immediate lexical-parent pad to link to, that would eventually lead them to $x. But I guess this takes us right back to the rest of the discussion you addressed: But what about inner named subs? { my $x = something(); sub foo { $x; } } If I understand things, the sub foo {$x;} is not actually compiled into a callable function until run time. At which time, a pad containing $x exists, which can be referenced by sub when converting {$x;} into a Code object bound to the package variable foo. I'm pretty sure that's wrong. sub is a compile-time macro that will always run at compile-time and force a compilation of its block, whatever that means in the context of its enclosing lexical environement (that is, I'm precisely unsure of what that means). In fact, I believe a compiled Perl6 program should never compile anything at runtime unless you do an explicit eval() call. But I'll be glad to have myself corrected on this if I'm wrong. Finally, on closures: When {$x++;} is evaluated as a closure, it is for all intents and purposes a function, with its own linked-list of pads. The head pad in the list contains nothing, and the next pad (the outer pad belonging to the function) contains $x. Since the head pad survives the call, and it has a reference on the outer pad containing $x, that outer pad survives as well. However, since nothing else points to it, the value of that particular $x is only visible to invokers of the closure returned. Ah, so you're saying that pads aren't explicitly cloned, they're just referenced so they wouldn't go away when the blocks that created them exit. Hmm, that's pretty nice (and the easiest thing in the world to implement), but isn't that a little wasteful? I mean, those pads may have a 100 lexical variables in them but my closure is ever going to look at only 3 of them (and I know that at compile-time, by parsing its leical variable/function/operator references), but the other 97 values stick around, too? -- avva