Allen Wirfs-Brock wrote:
I'm again considering creating a line-by-line translation of the ES6
spec algorithms into an executable evaluator of parse trees.
Cool. So my work might help you do the translation programmatically.
this would be a non-normative translation of the spec. that could
be used to test the algorithms.
I'd be interesting in hearing about the details of what you have been
working on.
Here's a quick overview:
(1)
The first stage is to massage the ES spec into something easier for me
to work with. I start with the MS Word file, import it into OpenOffice,
and save as HTML. The resulting markup is fairly crufty and
presentational, so I proceed to tweak the markup (via regex and
dom-hacking) until it more-or-less reflects the semantic structure of
the spec (using an ad hoc XML vocabulary).
For example, consider the second rule on page 22 of the current PDF
(I tried to find something small but interesting), whose text is:
The MV of DecimalLiteral :: DecimalIntegerLiteral . DecimalDigits is
the MV of DecimalIntegerLiteral plus (the MV of DecimalDigits times
10–n), where n is the number of characters in DecimalDigits.
For this rule, the HTML from OpenOffice is:
<LI><P LANG="en-GB" STYLE="margin-top: 0.01in; margin-bottom: 0in; line-height:
100%">
<FONT FACE="Times New Roman, serif"><FONT SIZE=2><FONT FACE="Arial,
sans-serif">The</FONT><FONT FACE="Arial, sans-serif">
MV of </FONT><I>DecimalLiteral </I><FONT FACE="Arial,
sans-serif"><B>::</B></FONT><B>
</B><I>DecimalIntegerLiteral </I><FONT FACE="Courier New"><B>.</B></FONT><I>
DecimalDigits</I><I><B> </B></I><FONT FACE="Arial, sans-serif">is
the MV of </FONT><I>DecimalIntegerLiteral</I> <FONT FACE="Arial,
sans-serif">plus
(the MV of </FONT><I>DecimalDigits</I><I><B> </B></I><FONT FACE="Arial,
sans-serif">times
</FONT>10<SUP>–</SUP><SUP><I>n</I></SUP><FONT FACE="Arial,
sans-serif">),
where </FONT><I>n</I><FONT FACE="Arial, sans-serif"> is the number
of characters in </FONT><I>DecimalDigit</I>s.</FONT></FONT></P>
Which I eventually turn into:
<li>
<para> The MV of
<prod1 n_colons="2"><lhs><nt>DecimalLiteral</nt></lhs>
<rhs>
<nt>DecimalIntegerLiteral</nt>
<t>.</t>
<nt>DecimalDigits</nt>
</rhs>
</prod1>
is the MV of <nt>DecimalIntegerLiteral</nt> plus
(the MV of <nt>DecimalDigits</nt> times
10<sup>-<var>n</var></sup>), where <var>n</var>
is the number of characters in <nt>DecimalDigits</nt>.
</para>
</li>
Here,
-- A <prod1> is a production where only one RHS (of possibly many)
appears.
-- An <nt> is a non-terminal.
-- A <t> is a terminal.
-- A <var> is a metavariable [i.e. a variable in the pseudo-code].
Note that both nonterminals and metavariables are serif italic in the
original, so when the massaging code sees a serif italic word, it has to
make a decision between <nt> and <var>.
Note also that the final 's' in the final "DecimalDigits" is outside the
<i> element in the HTML. (You can see it in the PDF too, if you enlarge
it sufficiently.) At various points in the process, I have lots of little
tweaks to fix things like this, so that they get converted correctly.
(Perhaps similar to the "preposterous concatenation of hacks" referred to
at http://people.mozilla.org/~jorendorff/es5.html .) In some cases, I've
pointed out the problem at bugs.ecmascript.org, but mostly I haven't.
(When the draft spec is made available in .docx format, I may recode
this whole stage. There are some oddities in the HTML that take a lot of
work to deal with, and I think they might be introduced by OpenOffice.)
(2)
The second stage is (roughly speaking) to find <para>s that contain
pseudo-code, and convert them into a form that says the same thing, but
reflects the syntactic structure in a way that's easily machine-parsable.
For the example rule, the pretty-printed result is:
(
(
_the_MV_of
(
(_nt DecimalLiteral)
_::
[(_nt DecimalIntegerLiteral) (_t '.') (_nt DecimalDigits)]
)
)
_IS
(
(
(_the_MV_of (_nt DecimalIntegerLiteral))
_plus
(
(_the_MV_of (_nt DecimalDigits))
_times
(10 _to_the_power (_negative $n))
)
)
_where
$n
_is
(_the_number_of_characters_in (_nt DecimalDigits))
)
)
(All the parentheses make it look like LISP, but that's mostly a
coincidence.)
I invite you to read the original rule and the above form, and confirm
that they say the same thing, using almost exactly the same words, but
the latter makes explicit the syntactic grouping that (the spec assumes)
a human reader can infer correctly in their head.
In addition to delimiting phrases with parentheses, the above form also
identifies the 'key words' of each phrase, and precedes each with an
underscore (gluing together multiple such if adjacent).
(3)
The third stage is basically an interpreter for the language in which
the above forms appear to be written.
(a) It reads in all the parenthesized forms, "parses" them (though the
parsing is almost trivial), and builds an internal data structure. E.g.,
(10 _to_the_power (_negative $n))
is represented by a form whose 'key' is 'X_to_the_power_X', and whose
two 'items' are the number 10 and the sub-form for (_negative $n).
(b) It reads in all the ES productions, and builds a parser for the ES
language.
(c) It reads in some ES code, parses it as a Program according to the
parser constructed in (b), and then tries to evaluate the Program using
the structure it constructed in (a).
Some keys are primitive, e.g.
X_to_the_power_X
negative_X
X_where_X_is_X
and so are implemented directly in the interpreter. Other keys are
defined by other rules, e.g.
the_MV_of_X
and so cause a search for an appropriate rule to execute.
----
I've got all of stage 1 done (though I have to adjust it slightly for
each new draft). Stages 2 and 3 are in progress. I pick a simple ES
program, then run the meta-interpreter until it hits a phrase that
either wasn't (really) translated (in stage 2), or isn't implemented (in
stage 3). I fix that, re-run, and things get a little farther. Sometimes
it even completes!
-Michael
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss