Re: Syntax explainer, phase 2: planning

2008-01-30 Thread Larry Wall
On Wed, Jan 30, 2008 at 04:08:04PM +0100, Moritz Lenz wrote:
: About half a year ago I posted my idea of a program that explains Perl 6
: syntax:
: 
: http://www.nntp.perl.org/group/perl.perl6.users/2007/07/msg621.html
: 
: Differing from my first post I know think that the best idea is to
: really parse a Perl 6 program with a fully fledged parser, and emit some
: kind of markup language that contains annotations that explains the
: semantic for each token.
: 
: Now you all know the story: nothing but perl can parse Perl, and of
: course I'm lazy, so I'd like to reuse an existing parser.
: 
: The most appealing idea so far is to use rakudo's grammar for
: experimenting, and later on STD.pm for the real thing.
: 
: The simplest option is to use a grammar, and write a different action
: class for it (the one who's methods are executed when {*} action stubs
: are found in the grammar), and instead of returning a syntax tree, I
: just return a data structure that contains the position, a description
: of the token, and the actual text.
: 
: That works fine - until the grammar is changed. So I need to execute
: BEGIN blocks, which implies that I need the normal parse tree as well.
: D'oh.

Let me correct an oversimplification here.  Most grammar changes
will *not* be done by BEGIN blocks.  BEGIN blocks (like eval) are a
tool of last resort; they're only there for when it's impossible to
achieve what you want by ordinary means.  Perl 6 is very much about
providing more ordinary means for things that used to have to be done
by BEGIN or eval.

Instead, grammar changes will be done by using a module that derives
a grammar from STD.  The derived grammar will be defined the same way
the original grammar is, so there is no change of the basic underlying
rules here.  If you find a sane way of dealing with STD you should be
able to deal with its derivatives just as easily.  Unlike BEGIN blocks,
grammar warping modules come with names and versions and authorities,
so when you warp your language by calling use, you are doing so in
a controlled fashion, and your new language can still be deterministic,
and produce a well-behaved AST.

: Do you have any idea how I may circumvent the problem?
: 
: I had some thoughts, but none appear to be a good solution:
:  * build two trees, one normal AST for the BEGIN block evaluation, and
: one parse tree for the markup output.
:  * subclass the normal action class, and annotate the AST with enough
: information, and as a second stop, after all BEGIN block were executed,
: filter out the interesting information.
:  * parse the BEGIN blocks with the normal grammar and action class, and
: the rest with the modified action class that emits the markup.
: 
: Actually I have no idea if any of these could work. Any thoughts?

From my MAD experiences, I'd say the only guaranteed correct way is to
annotate the existing AST, and to make sure that the standard grammar
mechanism has all the hooks you need to do that.  The big evil in the
Perl 5 parser is that it was continually forgetting things.  It does
this by lying to itself about what it saw.  Or in more moderate terms
replace this AST with that AST.  So when you talk about trying to
maintain a separate AST, I shudder with horror.  It's impossible.
So never replace.  Always augment and annotate.  It will save your
sanity, and stop the flame wars about forcing people to program in
the One True Language.  Perl 6 is not about that.  It's about being a
metalanguage in which you can express many languages, and doing so in
a sufficiently controlled fashion that we always know what language
any given lexical scope is expressed in.  And if we truly know what
language we're parsing at any moment, we can do everything PPI does
without much extra work, and without enforcing arbitrary linguistic
restrictions.

If the current {*} hack is insufficiently powerful for you to
annotate the AST correctly, then we need to negotiate a better hack.  :)

: A second problem is that the information should be accessible for
: perldoc. Since the documentation synopsis is indefinitely pending, I
: don't really want to rely on perldoc syntax, especially because the data
: has to be accessible from the action class.
: This could be circumvented by another abstraction layer (for example a
: text based DB that contains uniq token names and the description, and
: that DB could be used both by the action class and to emit some perldoc).
: Are there better ideas, perhaps even some that don't introduce more
: layers? ;-)
: 
: Any comments are welcome.

This seems to me to primarily be a naming problem, and the AST gives
you the naming path to get to any particular node.  The main thing
you want is some way of naming the top of the AST from within a CHECK
block (or from anywhere else you need to access the structure of the
program from).  Possibly this is a part of the %=FOO set of variables,
and we have $=AST or some such to go along with the %=POD variables.

Anyway, IDEs, 

Re: Syntax explainer, phase 2: planning

2008-01-30 Thread Moritz Lenz
Larry Wall wrote:
 On Wed, Jan 30, 2008 at 04:08:04PM +0100, Moritz Lenz wrote:
 : About half a year ago I posted my idea of a program that explains Perl 6
 : syntax:
 : 
 : http://www.nntp.perl.org/group/perl.perl6.users/2007/07/msg621.html
 : 
 : Differing from my first post I know think that the best idea is to
 : really parse a Perl 6 program with a fully fledged parser, and emit some
 : kind of markup language that contains annotations that explains the
 : semantic for each token.
 : 
 : Now you all know the story: nothing but perl can parse Perl, and of
 : course I'm lazy, so I'd like to reuse an existing parser.
 : 
 : The most appealing idea so far is to use rakudo's grammar for
 : experimenting, and later on STD.pm for the real thing.
 : 
 : The simplest option is to use a grammar, and write a different action
 : class for it (the one who's methods are executed when {*} action stubs
 : are found in the grammar), and instead of returning a syntax tree, I
 : just return a data structure that contains the position, a description
 : of the token, and the actual text.
 : 
 : That works fine - until the grammar is changed. So I need to execute
 : BEGIN blocks, which implies that I need the normal parse tree as well.
 : D'oh.
 
 Let me correct an oversimplification here.  Most grammar changes
 will *not* be done by BEGIN blocks.  BEGIN blocks (like eval) are a
 tool of last resort; they're only there for when it's impossible to
 achieve what you want by ordinary means.  Perl 6 is very much about
 providing more ordinary means for things that used to have to be done
 by BEGIN or eval.

correction accepted.
I should replace BEGIN blocks by anything that happens at compile time.

 Instead, grammar changes will be done by using a module that derives
 a grammar from STD.  The derived grammar will be defined the same way
 the original grammar is, so there is no change of the basic underlying
 rules here.  If you find a sane way of dealing with STD you should be
 able to deal with its derivatives just as easily.  Unlike BEGIN blocks,
 grammar warping modules come with names and versions and authorities,
 so when you warp your language by calling use, you are doing so in
 a controlled fashion, and your new language can still be deterministic,
 and produce a well-behaved AST.

... and ideally derived grammars will come with additional documentation
that overrides the STD.pm annotations. Sounds like a plan.

 : Do you have any idea how I may circumvent the problem?
 : 
 : I had some thoughts, but none appear to be a good solution:
 :  * build two trees, one normal AST for the BEGIN block evaluation, and
 : one parse tree for the markup output.
 :  * subclass the normal action class, and annotate the AST with enough
 : information, and as a second stop, after all BEGIN block were executed,
 : filter out the interesting information.
 :  * parse the BEGIN blocks with the normal grammar and action class, and
 : the rest with the modified action class that emits the markup.
 : 
 : Actually I have no idea if any of these could work. Any thoughts?
 
 From my MAD experiences, I'd say the only guaranteed correct way is to
 annotate the existing AST, and to make sure that the standard grammar
 mechanism has all the hooks you need to do that.

Ok, then I'll do that.

Question to the rakudo hackers: are the hooks there yet?
Start position and end position of the token + token name + key would be
enough, or start postion + a uniq key should work as well.

 The big evil in the
 Perl 5 parser is that it was continually forgetting things.  It does
 this by lying to itself about what it saw.  Or in more moderate terms
 replace this AST with that AST.  So when you talk about trying to
 maintain a separate AST, I shudder with horror.  It's impossible.
 So never replace.  Always augment and annotate.  It will save your
 sanity, and stop the flame wars about forcing people to program in
 the One True Language.  Perl 6 is not about that.  It's about being a
 metalanguage in which you can express many languages, and doing so in
 a sufficiently controlled fashion that we always know what language
 any given lexical scope is expressed in.  And if we truly know what
 language we're parsing at any moment, we can do everything PPI does
 without much extra work, and without enforcing arbitrary linguistic
 restrictions.
 
 If the current {*} hack is insufficiently powerful for you to
 annotate the AST correctly, then we need to negotiate a better hack.  :)

I think the {*} hack can be made sufficiently powerful, but it requires
additional work, for example currently you can't know from looking at $/
 which token/regex/rule it comes from.
You can work around it by adding that information in every action
method, but that's boring work and no fun.
Maybe a modifier :trace could annotate that automatically?

 : A second problem is that the information should be accessible for
 : perldoc. Since the documentation synopsis is indefinitely pending, I
 

Re: Syntax explainer, phase 2: planning

2008-01-30 Thread Moritz Lenz
Moritz Lenz wrote:
 I just ran
 ../../parrot perl6.pbc --target=past t/01-sanity/02-counter.t
  and it seems that I'm able to reconstruct the basic structure (I can
 identify operators and variables and their position in the source code,
 for example), but for example it stores variables this way:
 
 PMC 'PAST::Var'  {
 name = $counter
 viviself = Undef
 source = $counter
 pos = 192
 }
 
 That's probably all you need for the compiler, but it doesn't go into
 the details, for example that '$counter' is made of a sigil and an
 identifier.
 Is it overkill for a normal compilation to keep that information? Or
 could we add that?
 Or is such a detail level overkill even for a syntax explainer?

Uhm, forget that part;-)

particle++ told me to try --target=parse instead, and that's pretty much
verbose and all I should ever need ;-)

-- 
Moritz Lenz
http://moritz.faui2k3.org/ |  http://perl-6.de/



signature.asc
Description: OpenPGP digital signature


Re: Syntax explainer, phase 2: planning

2008-01-30 Thread jerry gay
On Jan 30, 2008 10:36 AM, Moritz Lenz [EMAIL PROTECTED] wrote:

 Larry Wall wrote:
  On Wed, Jan 30, 2008 at 04:08:04PM +0100, Moritz Lenz wrote:
  : Do you have any idea how I may circumvent the problem?
  :
  : I had some thoughts, but none appear to be a good solution:
  :  * build two trees, one normal AST for the BEGIN block evaluation, and
  : one parse tree for the markup output.
  :  * subclass the normal action class, and annotate the AST with enough
  : information, and as a second stop, after all BEGIN block were executed,
  : filter out the interesting information.
  :  * parse the BEGIN blocks with the normal grammar and action class, and
  : the rest with the modified action class that emits the markup.
  :
  : Actually I have no idea if any of these could work. Any thoughts?
 
  From my MAD experiences, I'd say the only guaranteed correct way is to
  annotate the existing AST, and to make sure that the standard grammar
  mechanism has all the hooks you need to do that.

 Ok, then I'll do that.

 Question to the rakudo hackers: are the hooks there yet?
 Start position and end position of the token + token name + key would be
 enough, or start postion + a uniq key should work as well.

well, you may have to dive into PIR to get at it, but it's all there.
for example, see the ws and afterws rules in the rakudo perl 6 grammar
file.

  The big evil in the
  Perl 5 parser is that it was continually forgetting things.  It does
  this by lying to itself about what it saw.  Or in more moderate terms
  replace this AST with that AST.  So when you talk about trying to
  maintain a separate AST, I shudder with horror.  It's impossible.
  So never replace.  Always augment and annotate.  It will save your
  sanity, and stop the flame wars about forcing people to program in
  the One True Language.  Perl 6 is not about that.  It's about being a
  metalanguage in which you can express many languages, and doing so in
  a sufficiently controlled fashion that we always know what language
  any given lexical scope is expressed in.  And if we truly know what
  language we're parsing at any moment, we can do everything PPI does
  without much extra work, and without enforcing arbitrary linguistic
  restrictions.
 
  If the current {*} hack is insufficiently powerful for you to
  annotate the AST correctly, then we need to negotiate a better hack.  :)

 I think the {*} hack can be made sufficiently powerful, but it requires
 additional work, for example currently you can't know from looking at $/
  which token/regex/rule it comes from.
 You can work around it by adding that information in every action
 method, but that's boring work and no fun.
 Maybe a modifier :trace could annotate that automatically?

yes, pge is missing the ability to know the name of the rule it's
currently inside. i'd like it because it'd make debugging and error
message generation easier, so it's on my list of things to implement,
but it's not there yet. i suppose a TODO ticket in RT couldn't
hurt

  : A second problem is that the information should be accessible for
  : perldoc. Since the documentation synopsis is indefinitely pending, I
  : don't really want to rely on perldoc syntax, especially because the data
  : has to be accessible from the action class.
  : This could be circumvented by another abstraction layer (for example a
  : text based DB that contains uniq token names and the description, and
  : that DB could be used both by the action class and to emit some perldoc).
  : Are there better ideas, perhaps even some that don't introduce more
  : layers? ;-)
  :
  : Any comments are welcome.
 
  This seems to me to primarily be a naming problem, and the AST gives
  you the naming path to get to any particular node.

 Not in the detail level that I want, no. At least not in the general case.

 You can't know from the AST if something was matched by foo=bar or by
 bar, and any closure can make() $/ something completely different.
 And (.foo) leaves no trace that could be used to identify the matching
 regex.

 I don't know if that's a problem in reality, or just an academic one.

 I just ran
 ../../parrot perl6.pbc --target=past t/01-sanity/02-counter.t
  and it seems that I'm able to reconstruct the basic structure (I can
 identify operators and variables and their position in the source code,
 for example), but for example it stores variables this way:

 PMC 'PAST::Var'  {
 name = $counter
 viviself = Undef
 source = $counter
 pos = 192
 }

 That's probably all you need for the compiler, but it doesn't go into
 the details, for example that '$counter' is made of a sigil and an
 identifier.
 Is it overkill for a normal compilation to keep that information? Or
 could we add that?
 Or is such a detail level overkill even for a syntax explainer?

your problem here is too much abstraction. by the time you're dealing
with abstract syntax tree, you've lost some syntactic info. you want
--target=parse instead, which 

Re: Syntax explainer, phase 2: planning

2008-01-30 Thread Larry Wall
On Wed, Jan 30, 2008 at 07:47:18PM +0100, Moritz Lenz wrote:
: particle++ told me to try --target=parse instead, and that's pretty much
: verbose and all I should ever need ;-)

Hmm, yes, but *only* if that switch merely augments information without
destroying information, and doesn't otherwise change how things are
actually parsed.  Otherwise we're back to the separate-but-equal fallacy.

Larry