Author: chip
Date: Thu Nov 3 11:08:34 2005
New Revision: 9756
Added:
trunk/docs/pdds/pdd20_lexical_vars.pod
Log:
Design, and implementation hints, for lexical variables.
Added: trunk/docs/pdds/pdd20_lexical_vars.pod
==============================================================================
--- (empty file)
+++ trunk/docs/pdds/pdd20_lexical_vars.pod Thu Nov 3 11:08:34 2005
@@ -0,0 +1,294 @@
+# Copyright: 2001-2005 The Perl Foundation. All Rights Reserved.
+# $Id$
+
+=head1 NAME
+
+docs/pdds/pdd20_lexical_vars.pod - Lexical variables
+
+=head1 VERSION
+
+$Revision$
+
+=head1 ABSTRACT
+
+This document defines the requirements and implmentation strategy for
+lexically scoped variables.
+
+=head1 SYNOPSIS
+
+ .sub foo
+ .lex "$a", P0
+ P1 = new Integer
+ P1 = 13013
+ store_lex "$a", P1
+ print P0 # prints 13013
+ .end
+
+ .sub bar :outer(foo)
+ P0 = find_lex "$a" # may succeed; depends on closure creation
+ .end
+
+ .sub baz
+ P0 = find_lex "$a" # guaranteed to fail: no .lex, no :outer()
+ .end
+
+ .sub corge
+ print "hi"
+ .end # no .lex and no :lex, thus: no PadInfo, no Pad
+
+
+ # Lexical behavior varies by HLL. For example,
+ # Tcl's lexicals are not declared at compile time.
+
+ .HLL "Tcl", "tcl_group"
+
+ .sub grault :lex # without ":lex", Tcl subs have no lexicals
+ P0 = fetch_lex "x" # FAILS
+
+ P0 = new Integer # really TclInteger
+ P0 = 42
+ store_lex "x", P0 # creates lexical "x"
+
+ P0 = fetch_lex "x" # SUCCEEDS
+ .end
+
+=head1 DESCRIPTION
+
+"Lexical scoping" (a.k.a. "static scoping") is a term with many
+explanations and examples across computer science. (And I'm sure
+I've only seen a fraction of them.)
+
+For Parrot purposes, "lexical variables" are those stored in a hash
+(or hash-like) PMC associated with a subroutine invocation, a.k.a.
+call frame.
+
+=head1 CONCEPTUAL MODEL
+
+=head2 Pad PMC
+
+Lexicals are stored in PMCs called "Pads". The Pad interface is a
+slight extension of the Hash interface. Lexical variables are
+conceptually key/value pairs, with string keys and PMC values.
+
+Pad objects are accessible to user code when necessary. But normal
+lexical variable usage does not require a Pad reference. Instead,
+specialize opcodes implement the common use cases. Pads contain
+references to their associated PadInfos.
+
+(Specialized opcodes are particular a Good Idea because most lexical
+usage involves searching more than one Pad, so a single Pad reference
+is not as useful as it might seem. And, of coures, opcodes can cheat
+... er, can be written in optimized C. :-))
+
+Pad keys are unique. Therefore, in each subroutine, there can be only
+one lexical variable with a given name.
+
+=head2 PadInfo PMC
+
+At compile time, each newly created Subroutine structure (or
+Subroutine derivative, e.g. Closure) is populated with a PMC of
+HLL-mapped type PadInfo. (Note that this type may actually be Null in
+some HLLs, e.g. Tcl.) PadInfo PMCs are the interface through which
+the PIR compiler communicates compile-time information about lexical
+variables.
+
+A PadInfo represents what is known about lexicals at compile time
+(e.g. variable names, perhaps variable types, etc.), while a Pad
+represents what becomes known at run time (values).
+
+=head2 Lookup strategy
+
+If Parrot is asked to access a lexical variable named $var, Parrot
+follows the following strategy. Note that fetch and store use the
+exact same approach.
+
+Parrot starts with the currently executing subroutine $sub, then
+loops through these steps:
+
+ 1. Starting at the current call frame, walk back until an active
+ frame is found that is executing $sub. Call it $frame.
+
+ (NOTE: The first time through, $sub is the current subroutine
+ and $frame is the currently live frame.)
+
+ 2. Look for $var in $frame.pad.
+
+ FIXME - is "$frame.pad.fetch($var)" too confusing a way to
+ write this?
+
+ 3. If the given pad contains $var, fetch/store it and
+ REPORT SUCCESS.
+
+ 4. Set $sub to $sub.outer. (That is, the textually enclosing
+ subroutine.) But if $sub has no outer sub, REPORT FAILURE.
+
+=head2 Pad and PadInfo are optional; the ":lex" attribute
+
+Parrot does not assume that every subroutine needs lexical variables.
+Therefore, Parrot defaults to I<not> creating PadInfo or Pad PMCs for
+a given subroutine. It only creates them when it first encounters a
+".lex" directive in the subroutine. If no such directive is found,
+Parrot does not create a PadInfo for it at compile time, nor a Pad for
+it at run time.
+
+However, an absence of ".lex" directives is normal for some languages
+(e.g. Tcl) which lack compile-time knowledge of lexicals. For these
+languages, the additional Subroutine attribute ":lex" should be
+specified. It tells Parrot to create PadInfo and Pads even though no
+lexicals are declared.
+
+=head2 Closures
+
+FIXME: Describe the current closure mechanism
+
+=head2 HLL Type Mapping
+
+The implementation of lexical variables in the PIR compiler depends on
+two new PMCs: Pad and PadInfo. However, the default Parrot Pad and
+PadInfo PMCs will not meet the needs of all languages. They should
+suit Perl 6, for example, but not Tcl.
+
+Therefore, it is expected that HLLs will map the Pad and PadInfo types
+to something more appropriate (e.g. TclPad and TclPadInfo). That
+mapping will automatically occur when the appropriate ".HLL" directive
+is in force.
+
+Using Tcl as an extreme example: TclPad will likely be a thin veneer
+on PMCHash. TclPadInfo will likely map to Null. Tcl provides no
+reliable compile-time information about lexicals; without any
+compile-time information to store, there's no need for TclPadInfo to
+do anything interesting.
+
+=head2 Nested Subroutines Have Outies; the ":outer" attribute
+
+For HLLs that support nested subroutines, Parrot provides a way to
+denote that a given subroutine is conceptually "inside" another.
+Lookup for lexical variables starts at the current call frame and
+proceeds through call frames that invoke "outer" subroutines. The
+specific meaning of "outer" is defined below, but it's designed to
+support the common linguistic structure of nested subroutines where
+inner subs refer to lexical variables contained in outer blocks.
+
+Note that "outer" and "caller" are very different concepts! For
+example, given the Perl 6 code:
+
+ sub foo {
+ my $a = 1;
+ my sub a { eval '$a' }
+ return &a;
+ }
+
+The &foo subroutiine is the outer subroutine of &a, but it is not
+the caller of &a.
+
+In the above example, the definition of the Parrot subroutine
+implementing &a must include a notation that it is textually enclosed
+within &foo. This is a static attribute of a Subroutine, set at
+compile time and never changed thereafter. (Unless you're evil, or
+Damian. But I repeat myself.) This information is given through a
+":outer()" subroutine attribute, e.g.:
+
+ .sub a :outer(foo)
+
+=head1 PAD AND PADINFO REQUIRED INTERFACES
+
+=head2 PadInfo
+
+Below are the standard PadInfo methods that all HLL PadInfo PMCs may
+support. Each PadInfo PMC should only define the methods that it can
+usefully implement, so the compiler can use method lookup failure to
+generate useful diagnostics (e.g. "register aliasing not supported by
+Tcl lexicals").
+
+Each language's PadInfo will implement methods that are helpful to
+that language's Pad. In the extreme case, PadInfo can be Null -- but
+if it is, the given HLL should not generate any ".lex*" directives.
+
+=over 4
+
+=item void init(PMC *sub, Context *ctx)
+
+Called exactly once.
+
+ FIXME: Is it kosher to pass a Context pointer? Probably not.
+ Possible fix: replace it with a documentation guarantee that
+ the new current Context will already exist at init() time;
+ then init() can grab it from the Interp.
+
+ TODO: Can init() take parameters like this? If not, rename.
+
+=item PMC *sub()
+
+Return the associated Subroutine.
+
+=item void declare_lex(STRING *name)
+
+Declare a lexical variable. The PIR compiler calls this method in
+response to a C<.lex STRING> directive.
+
+=item void declare_lex_preg(STRING *name, INTVAL preg)
+
+Declare a lexical variable that is an alias for a PMC register. The
+PIR compiler calls this method in response to a C<.lex STRING, PREG>
+directive. For example, given this preamble:
+
+ .lex "$a", $P0
+ $P1 = new Integer
+
+These two opcodes have the identical effect:
+
+ $P0 = $P1
+ store_lex "$a", $P1
+
+And, also, these two opcodes also have identical effect:
+
+ $P1 = $P0
+ $P1 = find_lex "$a"
+
+=back
+
+=head2 Pad
+
+Pads start by implementing the Hash interface: variable names are
+string keys, and variable values are PMCs.
+
+In addition, Pads must implement the following methods:
+
+=over 4
+
+=item init(PMC *padinfo)
+
+Called exactly once.
+
+TODO: Can init() take parameters like this? If not, rename.
+
+=item PMC *padinfo()
+
+Return the associated PadInfo.
+
+=back
+
+=head1 DEFAULT PARROT PAD AND PADINFO
+
+The default PadInfo supports lexicals only as aliases for PMC
+registers. It therefore implements declare_lex_preg(), but not
+declare_lex(). (Internally, it could be a Hash of some kind, where
+keys are String variable names and values are integer register
+numbers.)
+
+The default Pad (like all Pads) implements the Hash interface. When
+asked to look up a variable, it finds the corresponding register
+number by querying its associated PadInfo. It then gets or sets the
+given numbered register in its associated Parrot Context structure.
+
+=head1 ATTACHMENTS
+
+None.
+
+=head1 FOOTNOTES
+
+None.
+
+=head1 REFERENCES
+
+None.