Re: Syntax explainer, phase 2: planning

Moritz Lenz Wed, 30 Jan 2008 10:37:35 -0800

Larry Wall wrote:
> On Wed, Jan 30, 2008 at 04:08:04PM +0100, Moritz Lenz wrote:
> : About half a year ago I posted my idea of a program that explains Perl 6
> : syntax:
> : 
> : http://www.nntp.perl.org/group/perl.perl6.users/2007/07/msg621.html
> : 
> : Differing from my first post I know think that the best idea is to
> : really parse a Perl 6 program with a fully fledged parser, and emit some
> : kind of markup language that contains annotations that explains the
> : semantic for each token.
> : 
> : Now you all know the story: "nothing but perl can parse Perl", and of
> : course I'm lazy, so I'd like to reuse an existing parser.
> : 
> : The most appealing idea so far is to use rakudo's grammar for
> : experimenting, and later on STD.pm for the "real thing".
> : 
> : The simplest option is to use a grammar, and write a different action
> : class for it (the one who's methods are executed when {*} action stubs
> : are found in the grammar), and instead of returning a syntax tree, I
> : just return a data structure that contains the position, a description
> : of the token, and the actual text.
> : 
> : That works fine - until the grammar is changed. So I need to execute
> : BEGIN blocks, which implies that I need the "normal" parse tree as well.
> : D'oh.
> 
> Let me correct an oversimplification here.  Most grammar changes
> will *not* be done by BEGIN blocks.  BEGIN blocks (like eval) are a
> tool of last resort; they're only there for when it's impossible to
> achieve what you want by ordinary means.  Perl 6 is very much about
> providing more ordinary means for things that used to have to be done
> by BEGIN or eval.


correction accepted.
I should replace "BEGIN blocks" by "anything that happens at compile time".

> Instead, grammar changes will be done by using a module that derives
> a grammar from STD.  The derived grammar will be defined the same way
> the original grammar is, so there is no change of the basic underlying
> rules here.  If you find a sane way of dealing with STD you should be
> able to deal with its derivatives just as easily.  Unlike BEGIN blocks,
> grammar warping modules come with names and versions and authorities,
> so when you warp your language by calling "use", you are doing so in
> a controlled fashion, and your new language can still be deterministic,
> and produce a well-behaved AST.

... and ideally derived grammars will come with additional documentation
that overrides the STD.pm annotations. Sounds like a plan.

> : Do you have any idea how I may circumvent the problem?
> : 
> : I had some thoughts, but none appear to be a good solution:
> :  * build two trees, one normal AST for the BEGIN block evaluation, and
> : one parse tree for the markup output.
> :  * subclass the normal action class, and annotate the AST with enough
> : information, and as a second stop, after all BEGIN block were executed,
> : filter out the interesting information.
> :  * parse the BEGIN blocks with the normal grammar and action class, and
> : the rest with the modified action class that emits the markup.
> : 
> : Actually I have no idea if any of these could work. Any thoughts?
> 
> From my MAD experiences, I'd say the only guaranteed correct way is to
> annotate the existing AST, and to make sure that the standard grammar
> mechanism has all the hooks you need to do that.

Ok, then I'll do that.

Question to the rakudo hackers: are the hooks there yet?
Start position and end position of the token + token name + key would be
enough, or start postion + a uniq key should work as well.

> The big evil in the
> Perl 5 parser is that it was continually forgetting things.  It does
> this by lying to itself about what it saw.  Or in more moderate terms
> "replace this AST with that AST".  So when you talk about trying to
> maintain a separate AST, I shudder with horror.  It's impossible.
> So never replace.  Always augment and annotate.  It will save your
> sanity, and stop the flame wars about forcing people to program in
> the One True Language.  Perl 6 is not about that.  It's about being a
> metalanguage in which you can express many languages, and doing so in
> a sufficiently controlled fashion that we always know what language
> any given lexical scope is expressed in.  And if we truly know what
> language we're parsing at any moment, we can do everything PPI does
> without much extra work, and without enforcing arbitrary linguistic
> restrictions.
> 
> If the current {*} hack is insufficiently powerful for you to
> annotate the AST correctly, then we need to negotiate a better hack.  :)

I think the {*} hack can be made sufficiently powerful, but it requires
additional work, for example currently you can't know from looking at $/
 which token/regex/rule it comes from.
You can work around it by adding that information in every action
method, but that's boring work and no fun.
Maybe a modifier :trace could annotate that automatically?

> : A second problem is that the information should be accessible for
> : perldoc. Since the documentation synopsis is indefinitely pending, I
> : don't really want to rely on perldoc syntax, especially because the data
> : has to be accessible from the action class.
> : This could be circumvented by another abstraction layer (for example a
> : text based DB that contains uniq token names and the description, and
> : that DB could be used both by the action class and to emit some perldoc).
> : Are there better ideas, perhaps even some that don't introduce more
> : layers? ;-)
> : 
> : Any comments are welcome.
> 
> This seems to me to primarily be a naming problem, and the AST gives
> you the naming path to get to any particular node.  

Not in the detail level that I want, no. At least not in the general case.

You can't know from the AST if something was matched by <foo=bar> or by
<bar>, and any closure can make() $/ something completely different.
And (<.foo>) leaves no trace that could be used to identify the matching
regex.

I don't know if that's a problem in reality, or just an academic one.

I just ran
../../parrot perl6.pbc --target=past t/01-sanity/02-counter.t
 and it seems that I'm able to reconstruct the basic structure (I can
identify operators and variables and their position in the source code,
for example), but for example it stores variables this way:

PMC 'PAST::Var'  {
    <name> => "$counter"
    <viviself> => "Undef"
    <source> => "$counter"
    <pos> => 192
}

That's probably all you need for the compiler, but it doesn't go into
the details, for example that '$counter' is made of a sigil and an
identifier.
Is it overkill for a normal compilation to keep that information? Or
could we add that?
Or is such a detail level overkill even for a syntax explainer?

> The main thing
> you want is some way of naming the top of the AST from within a CHECK
> block (or from anywhere else you need to access the structure of the
> program from).  Possibly this is a part of the %=FOO set of variables,
> and we have $=AST or some such to go along with the %=POD variables.

... and I could write my syntax explainer (and possibly hilighter) as a
pure Perl 6 module without having to poke the compiler, just by
inspecting the AST? That would be superb!

> Anyway, IDEs, syntax highlighters, and refactoring engines are all
> going to want to access the same information, and we intend to make it
> possible for them to do that.  That is at the very heart of Perl 6, and
> the main reason it's so important for Perl 6 to be parsed in Perl 6.

++

Thanks,
Moritz

-- 
Moritz Lenz
http://moritz.faui2k3.org/ |  http://perl-6.de/

signature.asc
Description: OpenPGP digital signature

Re: Syntax explainer, phase 2: planning

Reply via email to