[amitp@cs.stanford.edu: Re: PEGs]

Kragen Javier Sitaker Mon, 01 Dec 2008 18:39:56 -0800

Forwarded, slightly edited, with permission.

----- Forwarded message from Amit Patel <[EMAIL PROTECTED]> -----


From: Amit Patel <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Re: PEGs

Hi Kragen,

Neat article!
[referring to
http://github.com/~kragen/peg-bootstrap/tree/master/peg.md or
http://considerate.murch-sitaker.org/~kragen/peg.md.html -kjs]

On http://lambda-the-ultimate.org/node/3039, Josh Haberman thinks PEGs
aren't as cool as I and everyone else thinks they are. I don't have enough
experience with them to know.

For Yapps, I wrote parts of the parser by hand, and it was enough to parse
the grammar for Yapps, so after a few iterations I had Yapps producing its
own parser. That was fun :)

For parsers in general, I'm so happy that ANTLR and packrat got people
interested again. It seemed like the field was dead, even though there is so
much more to do.  Three areas I'm interested in, but don't plan to work on
anytime soon:

   - Abstraction. When I write a rule and code to handle a comma-separated
   list of numbers, and then I write it all again for a semicolon-separated
   list of CSS rules, and then I write it all again for a period-separated list
   of names, I want some abstraction rule for List(separator, element) that not
   only can parse those lists, but produce the output value.
   - Inheritance. (A different form of abstraction) It might be useful to
   take a grammar and add to it and override existing rules by a mechanism
   similar to inheritance in classes. For example, Vax C let you use $ in
   identifiers. It'd be neat to take an existing grammar for C, inherit from
   it, and redefine the Token rule to also allow $.
   - Libraries. (This goes along with abstraction)  There are lots of
   patterns that are pretty common and could be put into a library.  The List()
   rule is one of course but nesting, whether () or [] or {}, or strings
   (single, double, triple quotes, escaping, etc.) would be nice to put into a
   library.
   - Better theory. In its purest form, parser theory tells you whether a
   string matched. In practice, we want to extract parts of it and compute
   functions of those parts. This means the grammar aa* and a*a, although
   equivalent in theory (because they match the same set of strings), are
   actually different once you take into account extracting data. So I think
   the theorists started with the wrong problem, and I'd like to see theory
   that addresses the actual problem that parsers are used to solve. Maybe it
   won't make any difference; I don't know. But I have this bad feeling about
   the foundations.
   - Error handling. (This goes along with better theory)  Parsers tend to
   be fine when things go right, but I haven't seen any good theory around
   handling errors. In practice it seems that people use heuristics and hacks.
   - Diff parsing. (This is related to errors)  In theory, a parser is run
   on a string, and then you stop.  But in practice, you're running a parser on
   a string and then run the parser on a modified string.  For example, if my
   input is "(if (foo)  bar)" and I change it to "(if (foo bar)" then by
   looking at the change I made in the editor, you can tell that the error is
   likely to be that foo should have ')' after it.  But the parser, having
   forgotten the previous version of the string, will tell me the error is
   "missing )" at the very end of the string.  The history can tell you
   something about where the error occurred.  Looking at it another way,
   suppose you have the last successful parse, and you know which parts of the
   parse tree correspond to the parts of the input string. When the input
   string changes, try to preserve the parse tree to the left of and to the
   right of the input string changes.  That will give you a subtree that you
   expect errors to occur within, and the missing ')' likely should be there,
   not at the very end (rightmost leaf of the tree).

- Amit

----- End forwarded message -----
-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss

[amitp@cs.stanford.edu: Re: PEGs]

Reply via email to