Author: chip
Date: Thu Nov  3 11:08:34 2005
New Revision: 9756

Added:
   trunk/docs/pdds/pdd20_lexical_vars.pod
Log:
Design, and implementation hints, for lexical variables.

Added: trunk/docs/pdds/pdd20_lexical_vars.pod
==============================================================================
--- (empty file)
+++ trunk/docs/pdds/pdd20_lexical_vars.pod      Thu Nov  3 11:08:34 2005
@@ -0,0 +1,294 @@
+# Copyright: 2001-2005 The Perl Foundation.  All Rights Reserved.
+# $Id$
+
+=head1 NAME
+
+docs/pdds/pdd20_lexical_vars.pod - Lexical variables
+
+=head1 VERSION
+
+$Revision$
+
+=head1 ABSTRACT
+
+This document defines the requirements and implmentation strategy for
+lexically scoped variables.
+
+=head1 SYNOPSIS
+
+    .sub foo
+        .lex "$a", P0
+        P1 = new Integer
+        P1 = 13013
+        store_lex "$a", P1
+        print P0            # prints 13013
+    .end
+
+    .sub bar :outer(foo)
+        P0 = find_lex "$a"  # may succeed; depends on closure creation
+    .end
+
+    .sub baz
+        P0 = find_lex "$a"  # guaranteed to fail: no .lex, no :outer()
+    .end
+
+    .sub corge
+        print "hi"
+    .end                    # no .lex and no :lex, thus: no PadInfo, no Pad
+
+
+    # Lexical behavior varies by HLL.  For example,
+    # Tcl's lexicals are not declared at compile time.
+
+    .HLL "Tcl", "tcl_group"
+
+    .sub grault :lex        # without ":lex", Tcl subs have no lexicals
+        P0 = fetch_lex "x"  # FAILS
+
+        P0 = new Integer    # really TclInteger
+        P0 = 42
+        store_lex "x", P0   # creates lexical "x"
+
+        P0 = fetch_lex "x"  # SUCCEEDS
+    .end
+
+=head1 DESCRIPTION
+
+"Lexical scoping" (a.k.a. "static scoping") is a term with many
+explanations and examples across computer science.  (And I'm sure
+I've only seen a fraction of them.)
+
+For Parrot purposes, "lexical variables" are those stored in a hash
+(or hash-like) PMC associated with a subroutine invocation, a.k.a.
+call frame.
+
+=head1 CONCEPTUAL MODEL
+
+=head2 Pad PMC
+
+Lexicals are stored in PMCs called "Pads".  The Pad interface is a
+slight extension of the Hash interface.  Lexical variables are
+conceptually key/value pairs, with string keys and PMC values.
+
+Pad objects are accessible to user code when necessary.  But normal
+lexical variable usage does not require a Pad reference.  Instead,
+specialize opcodes implement the common use cases.  Pads contain
+references to their associated PadInfos.
+
+(Specialized opcodes are particular a Good Idea because most lexical
+usage involves searching more than one Pad, so a single Pad reference
+is not as useful as it might seem.  And, of coures, opcodes can cheat
+... er, can be written in optimized C.  :-))
+
+Pad keys are unique.  Therefore, in each subroutine, there can be only
+one lexical variable with a given name.
+
+=head2 PadInfo PMC
+
+At compile time, each newly created Subroutine structure (or
+Subroutine derivative, e.g.  Closure) is populated with a PMC of
+HLL-mapped type PadInfo.  (Note that this type may actually be Null in
+some HLLs, e.g. Tcl.)  PadInfo PMCs are the interface through which
+the PIR compiler communicates compile-time information about lexical
+variables.
+
+A PadInfo represents what is known about lexicals at compile time
+(e.g. variable names, perhaps variable types, etc.), while a Pad
+represents what becomes known at run time (values).
+
+=head2 Lookup strategy
+
+If Parrot is asked to access a lexical variable named $var, Parrot
+follows the following strategy.  Note that fetch and store use the
+exact same approach.
+
+Parrot starts with the currently executing subroutine $sub, then
+loops through these steps:
+
+  1. Starting at the current call frame, walk back until an active
+     frame is found that is executing $sub.  Call it $frame.
+
+     (NOTE: The first time through, $sub is the current subroutine
+            and $frame is the currently live frame.)
+
+  2. Look for $var in $frame.pad.
+
+     FIXME - is "$frame.pad.fetch($var)" too confusing a way to
+             write this?
+
+  3. If the given pad contains $var, fetch/store it and
+     REPORT SUCCESS.
+
+  4. Set $sub to $sub.outer.  (That is, the textually enclosing
+     subroutine.)  But if $sub has no outer sub, REPORT FAILURE.
+
+=head2 Pad and PadInfo are optional; the ":lex" attribute
+
+Parrot does not assume that every subroutine needs lexical variables.
+Therefore, Parrot defaults to I<not> creating PadInfo or Pad PMCs for
+a given subroutine.  It only creates them when it first encounters a
+".lex" directive in the subroutine.  If no such directive is found,
+Parrot does not create a PadInfo for it at compile time, nor a Pad for
+it at run time.
+
+However, an absence of ".lex" directives is normal for some languages
+(e.g. Tcl) which lack compile-time knowledge of lexicals.  For these
+languages, the additional Subroutine attribute ":lex" should be
+specified.  It tells Parrot to create PadInfo and Pads even though no
+lexicals are declared.
+
+=head2 Closures
+
+FIXME: Describe the current closure mechanism
+
+=head2 HLL Type Mapping
+
+The implementation of lexical variables in the PIR compiler depends on
+two new PMCs: Pad and PadInfo.  However, the default Parrot Pad and
+PadInfo PMCs will not meet the needs of all languages.  They should
+suit Perl 6, for example, but not Tcl.
+
+Therefore, it is expected that HLLs will map the Pad and PadInfo types
+to something more appropriate (e.g. TclPad and TclPadInfo).  That
+mapping will automatically occur when the appropriate ".HLL" directive
+is in force.  
+
+Using Tcl as an extreme example: TclPad will likely be a thin veneer
+on PMCHash.  TclPadInfo will likely map to Null.  Tcl provides no
+reliable compile-time information about lexicals; without any
+compile-time information to store, there's no need for TclPadInfo to
+do anything interesting.
+
+=head2 Nested Subroutines Have Outies; the ":outer" attribute
+
+For HLLs that support nested subroutines, Parrot provides a way to
+denote that a given subroutine is conceptually "inside" another.
+Lookup for lexical variables starts at the current call frame and
+proceeds through call frames that invoke "outer" subroutines.  The
+specific meaning of "outer" is defined below, but it's designed to
+support the common linguistic structure of nested subroutines where
+inner subs refer to lexical variables contained in outer blocks.
+
+Note that "outer" and "caller" are very different concepts!  For
+example, given the Perl 6 code:
+
+   sub foo {
+      my $a = 1;
+      my sub a { eval '$a' }
+      return &a;
+   }
+
+The &foo subroutiine is the outer subroutine of &a, but it is not
+the caller of &a.
+
+In the above example, the definition of the Parrot subroutine
+implementing &a must include a notation that it is textually enclosed
+within &foo.  This is a static attribute of a Subroutine, set at
+compile time and never changed thereafter.  (Unless you're evil, or
+Damian.  But I repeat myself.)  This information is given through a
+":outer()" subroutine attribute, e.g.:
+
+    .sub a :outer(foo)
+
+=head1 PAD AND PADINFO REQUIRED INTERFACES
+
+=head2 PadInfo
+
+Below are the standard PadInfo methods that all HLL PadInfo PMCs may
+support.  Each PadInfo PMC should only define the methods that it can
+usefully implement, so the compiler can use method lookup failure to
+generate useful diagnostics (e.g. "register aliasing not supported by
+Tcl lexicals").
+
+Each language's PadInfo will implement methods that are helpful to
+that language's Pad.  In the extreme case, PadInfo can be Null -- but
+if it is, the given HLL should not generate any ".lex*" directives.
+
+=over 4
+
+=item void init(PMC *sub, Context *ctx)
+
+Called exactly once.
+
+ FIXME: Is it kosher to pass a Context pointer?  Probably not.
+        Possible fix: replace it with a documentation guarantee that
+        the new current Context will already exist at init() time;
+        then init() can grab it from the Interp.
+
+ TODO: Can init() take parameters like this?  If not, rename.
+
+=item PMC *sub()
+
+Return the associated Subroutine.
+
+=item void declare_lex(STRING *name)
+
+Declare a lexical variable.  The PIR compiler calls this method in
+response to a C<.lex STRING> directive.
+
+=item void declare_lex_preg(STRING *name, INTVAL preg)
+
+Declare a lexical variable that is an alias for a PMC register.  The
+PIR compiler calls this method in response to a C<.lex STRING, PREG>
+directive.  For example, given this preamble:
+
+    .lex "$a", $P0
+    $P1 = new Integer
+
+These two opcodes have the identical effect:
+
+    $P0 = $P1
+    store_lex "$a", $P1
+
+And, also, these two opcodes also have identical effect:
+
+    $P1 = $P0
+    $P1 = find_lex "$a"
+
+=back
+
+=head2 Pad
+
+Pads start by implementing the Hash interface: variable names are
+string keys, and variable values are PMCs.
+
+In addition, Pads must implement the following methods:
+
+=over 4
+
+=item init(PMC *padinfo)
+
+Called exactly once.
+
+TODO: Can init() take parameters like this?  If not, rename.
+
+=item PMC *padinfo()
+
+Return the associated PadInfo.
+
+=back
+
+=head1 DEFAULT PARROT PAD AND PADINFO
+
+The default PadInfo supports lexicals only as aliases for PMC
+registers.  It therefore implements declare_lex_preg(), but not
+declare_lex().  (Internally, it could be a Hash of some kind, where
+keys are String variable names and values are integer register
+numbers.)
+
+The default Pad (like all Pads) implements the Hash interface.  When
+asked to look up a variable, it finds the corresponding register
+number by querying its associated PadInfo.  It then gets or sets the
+given numbered register in its associated Parrot Context structure.
+
+=head1 ATTACHMENTS
+
+None.
+
+=head1 FOOTNOTES
+
+None.
+
+=head1 REFERENCES
+
+None.

Reply via email to