Forwarded, slightly edited, with permission. ----- Forwarded message from Amit Patel <[EMAIL PROTECTED]> -----
From: Amit Patel <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: Re: PEGs Hi Kragen, Neat article! [referring to http://github.com/~kragen/peg-bootstrap/tree/master/peg.md or http://considerate.murch-sitaker.org/~kragen/peg.md.html -kjs] On http://lambda-the-ultimate.org/node/3039, Josh Haberman thinks PEGs aren't as cool as I and everyone else thinks they are. I don't have enough experience with them to know. For Yapps, I wrote parts of the parser by hand, and it was enough to parse the grammar for Yapps, so after a few iterations I had Yapps producing its own parser. That was fun :) For parsers in general, I'm so happy that ANTLR and packrat got people interested again. It seemed like the field was dead, even though there is so much more to do. Three areas I'm interested in, but don't plan to work on anytime soon: - Abstraction. When I write a rule and code to handle a comma-separated list of numbers, and then I write it all again for a semicolon-separated list of CSS rules, and then I write it all again for a period-separated list of names, I want some abstraction rule for List(separator, element) that not only can parse those lists, but produce the output value. - Inheritance. (A different form of abstraction) It might be useful to take a grammar and add to it and override existing rules by a mechanism similar to inheritance in classes. For example, Vax C let you use $ in identifiers. It'd be neat to take an existing grammar for C, inherit from it, and redefine the Token rule to also allow $. - Libraries. (This goes along with abstraction) There are lots of patterns that are pretty common and could be put into a library. The List() rule is one of course but nesting, whether () or [] or {}, or strings (single, double, triple quotes, escaping, etc.) would be nice to put into a library. - Better theory. In its purest form, parser theory tells you whether a string matched. In practice, we want to extract parts of it and compute functions of those parts. This means the grammar aa* and a*a, although equivalent in theory (because they match the same set of strings), are actually different once you take into account extracting data. So I think the theorists started with the wrong problem, and I'd like to see theory that addresses the actual problem that parsers are used to solve. Maybe it won't make any difference; I don't know. But I have this bad feeling about the foundations. - Error handling. (This goes along with better theory) Parsers tend to be fine when things go right, but I haven't seen any good theory around handling errors. In practice it seems that people use heuristics and hacks. - Diff parsing. (This is related to errors) In theory, a parser is run on a string, and then you stop. But in practice, you're running a parser on a string and then run the parser on a modified string. For example, if my input is "(if (foo) bar)" and I change it to "(if (foo bar)" then by looking at the change I made in the editor, you can tell that the error is likely to be that foo should have ')' after it. But the parser, having forgotten the previous version of the string, will tell me the error is "missing )" at the very end of the string. The history can tell you something about where the error occurred. Looking at it another way, suppose you have the last successful parse, and you know which parts of the parse tree correspond to the parts of the input string. When the input string changes, try to preserve the parse tree to the left of and to the right of the input string changes. That will give you a subtree that you expect errors to occur within, and the missing ')' likely should be there, not at the very end (rightmost leaf of the tree). - Amit ----- End forwarded message ----- -- To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss