On Thu, Jun 14, 2012 at 07:27:09AM -0400, Johnicholas Hines wrote:
> What would a self-interpreter for this Femto-ML look like?
> Something like Mogensen's self-interpreter?
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.89.9933

(Torben Mogensen: I am Ccing you in case you want to respond, perhaps to point
to a corrected version of the paper, since some of my remarks about your paper
are snarky, which is unfair, since the part of the paper I'm talking about is
peripheral to its main points, to extend a sentence far past its natural
lifetime.  I won't be offended if you don't feel the need to respond, but I
feel like it would be unfair not to give you the opportunity.  Further context,
if you're interested, is at
<http://lists.canonical.org/pipermail/kragen-tol/2012-June/000958.html>.)

Johnicolas: maybe.  I'm trying to understand the self-interpreter you're
referring to, impeded somewhat by the variety of errors Mogensen left in the
paper, perhaps as an extra challenge to readers.  To clarify the discussion,
here's his syntax, from the paper:

    Program → TypeDeclaration∗ FunctionDeclaration+ [sic, he means 
FunDeclaration+]
    TypeDeclaration → data TypeId = ConstructorDeclaration
    ConstructorDeclaration → ConstructorId TypeId∗
                           | ConstructorId TypeId∗ | ConstructorDeclaration

    FunDeclaration → FunctionId VariableId∗ = Expression
    Expression → VariableId
               | FunctionId Expression∗
               | ConstructorId Expression∗
               | case Expression of Branch+

    Branch → ConstructorId VariableId∗ => Expression

I note that he seems to have forgotten parentheses, and that his grammar is
in any case quite ambiguous; there's no way to tell where the VariableIds and
FunctionIds in an Expression in a FunDeclaration end, and where the FunctionId
and VariableIds of the following FunDeclaration begin.  In his example
programs, he relies on whitespace (either newlines or indentation) to make this
clear to the reader.

And here's the interpreter, which is much easier to understand if you read the
grammar first:

    data num   = Z | S num
    data univ  = Con num ulist
    data ulist = Un | Uc univ ulist
    data funs  = Fn | Fun exp funs
    data exp   = Var num | Fap num elist | Cap num elist | Case exp elist
    data elist = En | Ec exp elist

    run p args = apply Z p args p

    apply f fs args p =
      case fs of
        Fn       => Con Z Un
        Fun e fs => case f of
                         Z   => eval e args p
                         S f => apply f fs args p

    eval e vs p =
      case e of
        Var n      => lookup n vs
        Fap n es   => apply n p (evallist es vs p) p
        Cap n es   => Con n (evallist es vs p)
        Case e es  => case (eval e vs p) of
                          Con n vs1 => eval (select n es) (append vs1 vs) p

    evallist es vs p =
      case es of
        En      => Un
        Ec e es => Uc (eval e vs p) (evallist es vs p)

    select n es =
      case es of
        En      => Cap Z En
        Ec e es => case n of
                       Z   => e
                       S n => select n es

Morgensen remarks, "The program is represented by a list of un-names [sic]
functions, which in function calls are referred to by position. Similarly,
variables and constructors are represented by their positions in declarations.
Type declarations are not explicitly represented in the interpreter. The
self-interpreter is shown in figure 4. Missing from this are definitions of
`lookup`, `append`, `hd` and `tl`, which we have omitted for reasons of space."

The missing `append` is relatively straightforward, once you figure out which
of the redundant list types it needs to be:

    append vs1 vs =
      case vs1 of
        Un       => vs
        Uc v vs2 => Uc v (append vs2 vs)

`lookup` is presumably exactly the same as `select` but with a different list
type:

    lookup n vs =
      case es of
        Un      => Cap Z En
        Uc v vs => case n of
                       Z   => v
                       S n => select n vs

`hd` and `tl` are not actually used in the above code, so I have omitted them.
The y do get used later in the paper.

It's a bit of a cheat to call this a "self-interpreter", because, in his
enthusiasm to write an interpreter that neither provides nor requires equality
testing, he has written an interpreter that requires a representation of the
program fairly far removed from the abstract syntax tree produced by a parser.
All the variables must be replaced with de Bruijn indices, and all of the
case-expressions must be padded out with the addition of extra expressions
(perhaps `Cap Z En`) to cover the constructors for which the program does not
specify an expression.  

Since the compiler needed to do these transformations from the AST is
comparable in size to the interpreter, perhaps the interpreter should instead
be called a "virtual machine".  And the fact that it's written in the language
it interprets is entirely irrelevant to the interesting parts of the paper.

Mogensen's miniature language doesn't include nested patterns, equality tests
in the pattern-matcher, or higher-order functions (or, consequently, closures
or currying), so I suspect that a FeML metacircular interpreter would probably
be substantially more code, even though the language is syntactically simpler.
However, like Mogensen's example, it could omit type-checking entirely.

Kragen
-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss

Reply via email to