Your PEG work should be interesting and I plan to follow it. You made quite a number of other significant (non-PEG) points and I am aware that I still owe you responses to them. That's underway. It's taking some time because I am doing some reading for it.
best, jeffrey On Wed, Feb 27, 2019 at 5:12 PM <[email protected]> wrote: > Thanks for your answer! > > I do mostly agree, PEG is not a cure-all. But to make my opinion clear > (and I do > like to think it's an informed opinion rooted in experience) neither is it > significantly worse in usability than CFGs. > > I must admit I missed that note in the footnotes. It doesn't do much for > me in > terms of nuance, honestly. It's in a reference after a phrase that reads > "PEG, > in safe use, would essentially be LL(1)". If that's the part of saying > that "PEG > has safe uses" then it isn't nuance, but rather a judgement that says that > PEG > is either dangerous or crippled (as LL(1) is). > > It all comes back to this idea that PEG is "dangerous". I'm open to > discussing > that — what I currently think is that it really means that it is > **unintuitive** > to some people (as I do explain in my answer to Leszek above). > > One of the (multiple) things that make me believe this is precisely this > reference you cited in support of "PEG, in safe use, would essentially be > LL(1)". I think this is a gross misrepresentation of the claims of the > paper. > I'd stake 99% confidence that if you'd ask the authors if the paper says or > shows that, they'd reply with an emphatic "no". > > Consider the abstract: > > > Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs)have > > several similarities and a few differences in both their syntax and > semantics, > > but they are usually presented through formalisms that hinder a proper > > comparison. In this paper we present a new formalism for CFGs that > highlights > > the similarities and differences between them. The new formalism borrows > from > > PEGs the use of parsing expressions and the recognition-based semantics. > We > > show how one way of removing non-determinism from this formalism yields a > > formalism with the semantics of PEGs. We also prove, based on these new > > formalisms, how LL(1) grammars define the same language whether > interpreted as > > CFGs or as PEGs, and also show how strong-LL(k), right-linear, and > LL-regular > > grammars have simple language-preserving translations from CFGs to PEGs. > Once > > these classes of CFGs can be automatically translated to equivalent > PEGs, we > > can reuse classic top-down grammars in PEG-based tools. > > This essentially says that LL(1) grammars are the subset of both PEG and > CFG > that share the same behaviour. > > Hence your definition of "safe" seems to be "behaving similarly to CFGs". > > But like I said, and you said in turn, everything is entitled to their > own opinions. > > I do however think that if a sizeable portion of the parsing community > would > agree that PEG are valuable, it might be worthwhile to point out that your > opinion is in fact, an opinion that does is not consensual by any stretch. > > Similarly, why not mention some work building on PEG? In particular > left-recursion and packrat parsing. Even if you do think they build on a > flawed > base, certainly we can agree these results are very significant for those > of us > who do use PEG? > > About quips, while I don't mind them in general, they seem to me here to be > precisely an example of the "rope-pulling" that you decry. Wouldn't it be > more > high and mighty to simply list facts and publications and let people form > their > own opinion? > > Now, why do I care? I don't mind much about people being "wrong on the > internet" > (™) — that ship sailed long ago — but the idea of a parsing timeline is > exciting to me. I'd love to recommend it and advertise it, but I find the > bias > towards the end prevents me to do so. I think many people might share this > opinion. > > The way I tend to perceive "timelines" and other "histories" is as somewhat > authoritative and unbiased — at least, that's the platonic ideal. Think > about > the standard that Wikipedia tries to enforce for instance. Wikipedia does > not > only list consensus — opinions and disputes are relevant, after all — but > clearly does mark them as such. > > Do you think there is a way for the timeline to move in that direction? > > On Wednesday, February 20, 2019 at 1:53:31 AM UTC+1, Jeffrey Kegler wrote: >> >> Thanks for your kind words. In this, I'll just respond to the PEG points. >> >> That part of my timeline has caused more blowback than any other part, >> not unexpectedly. The quips are aimed, not at people like yourself who, >> while PEG optimists, know its difficulties and do not mislead others. >> Unfortunately, at the time I wrote that version of the timeline (and still >> I suspect) you are the minority of pro-PEG voices out on the Web. >> >> Many presentations of PEG present it as a "throw anything at me" >> solution. These less careful voices were the loudest and cause a lot of >> folks to waste time pushing a rope. My quips were aimed to catch the >> attention of these folks and they did their job. >> >> I've done this before, with yacc, for the same reason, and caught the >> same kind of heat. People routinely were recommending yacc for "serious >> parsing" either in ignorance or reckless disregard of the actual experience >> with it. So I did a "bovicide" series of blog posts. Nowadays I don't >> diss yacc, because it's no longer the danger to parsing newbies that it was. >> >> If you read my entry carefully, and get beyond the quips (as I am sure >> you did), you see that I *do* say there are safe uses of PEG, and in the >> footnote concede that my description in the text is "negative". I go on to >> point the reader toward the pro-PEG literature -- a literature of which the >> less careful PEG advocates are usually unaware. >> >> So to my careless reader, a warning about PEG gets across, and to my >> careful reader, I point out that, while I am more optimistic about >> Earley/Leo, there is safe use of PEG and a literature worth consulting. >> >> In fact, I think there are 5 "forever" algorithms, parsing algorihms that >> are of permanent significance, and PEG is one of them. (The other 4 are >> regular expressions, recursive descent, Earley/Leo and Sakai's algorihm, >> aka CYK). >> >> Getting to the merits, I've heard before that some find the PEG syntax >> more intuitive than BNF. Everyone is the ultimate authority on their own >> intuitions, so I won't dispute this. And you are more optimistic about >> research into PEG than I am, clearly. But as Yogi Berra says, "predictions >> are difficult, especially about the future". My timeline does not engage >> in predictions, although it certainly reflects my optimism wrt Earley/Leo. >> I do claim to be entitled to greater confidence in my predictions on the >> grounds, at 65, I am much less likely to be around if they fall flat. :-) >> >> I'll tackle the points about why the timeline includes/excludes what it >> does separately. >> >> On Tue, Feb 19, 2019 at 5:47 AM <[email protected]> wrote: >> >>> (Following up from >>> https://twitter.com/jeffreykegler/status/1097628806091816960) >>> >>> Dear Jeffrey, >>> >>> I was delighted when I found about your parsing timeline a couple of >>> months ago. >>> I'm a PhD student researching parsing myself and it's great to see >>> resources >>> like this bringing awareness to what happened in the field, and it's a >>> great >>> reference for people in the field as well. It also got me more >>> interested in >>> Earley and your own work of it, which is quite cool. >>> >>> That being said, I do have some insatisfactions about the timeline, >>> especially >>> the end of it. It's always easy to criticize, so I would like to offer >>> my help >>> instead, to see if I can contribute something useful to the timeline. >>> >>> According to me, there are two points that could use improvement. First, >>> I feel >>> top-down recursive-descent formalizations, especially PEG, are being >>> unfairly >>> dismissed. Second, it seems to me like there are quite a few new >>> developments >>> that would deserve inclusion at the end of the timeline (especially by >>> comparison to the level of details towards the start of the timeline). >>> >>> I don't pretend to have the objective truth, but the goal is to have a >>> productive conversation that can hopefully lead to improved accuracy. >>> >>> --- >>> >>> Regarding PEG, I find the whole description minus the first paragraph to >>> be >>> rather contestable and (I feel) uncharitable. >>> >>> It's entirely correct that PEG does not cover BNF. But BNF is >>> essentially a >>> notation for the CFG formalism, which PEG is not. >>> >>> It seems your main beef with PEG is the problem that has been described >>> as >>> "language hiding" or "prefix capture": the fact that a rule like `A ::= >>> a | aa` >>> will not match the input "aa". >>> >>> This is indeed an important problem of PEGs. But the way I see it, it's >>> a dual >>> to the problem of ambiguity in CFGs. Both preclude one another: PEGs are >>> unambiguous but have prefix capture, CFGs do not have prefix capture but >>> are >>> ambiguous. The detection of both issue is provably intractable >>> (statically) and >>> may be more related than we know, as a relatively similar machinery can >>> be put >>> to work to do partial detection of both ambiguity in CFGs and prefix >>> capture in >>> PEGs, cf. "Modular Syntax Demands Verification" by Sylvain Schmitz (*). >>> >>> (*) http://www.i3s.unice.fr/~mh/RR/2006/RR-06.32-S.SCHMITZ.pdf >>> >>> I have written a fair deal of both PEGs and CFGs and both seem about >>> equally >>> difficult to write. Non-general CFG formalism (LALR, LL) are much >>> harder, and so >>> are PEG tools that have no support to build left-associative parse trees. >>> >>> I also would never advise anyone not to test a grammar, whatever the >>> formalism. >>> I've certainly never written a big chunk of grammar without making a >>> mistake >>> whatever the formalism. And in practice, the overwhelming majority of >>> these >>> mistakes weren't about either prefix capture nor ambiguity. >>> >>> If your practical experience differs, I'd be interested to hear about it. >>> >>> --- >>> >>> Regarding new developments, here are a couple of ideas of the top of my >>> head. >>> The first two items of the list are not random, since those were things >>> I worked >>> on in my research (http://norswap.com/publications/) >>> >>> - Left-recursion and left-associativity handling in PEG, since it fixes >>> the >>> foremost problem (in my opinion) with PEGs >>> >>> - Context-sensitive parsing >>> >>> You already already mention monadic parsing (but do not emphasize its >>> context-sensitive properties). Another oldie that deserves to be >>> mentionned >>> are Prolog-style definite clause grammars (DCGs). >>> >>> There are some other works worth mentioning if this is a topic of >>> interest for >>> you. >>> >>> My own work in the area is essentially pushing an idea which expressed >>> in a >>> lovely manner: >>> >>> > Recursive descent does have a huge advantage, one which, despite its >>> severe >>> > limitations, will save it from obsolescence time and again. >>> Hand-written >>> > recursive descent is essentially calling subroutines. Adding custom >>> > modification to recursive descent is very straight-forward. >>> >>> But making it so that these subroutines may manipulate state in a way >>> that is >>> safe in the presence of backtracking, enabling context-sensitive >>> parsing. >>> >>> - Packrat parsing, if only because it enables O(1) parsing of PEG, a >>> class of >>> languages that may still strictly include the whole of CFG — as far as >>> I know, >>> we do not know of a simple language that can be expressed in CFG but >>> not in >>> PEG (we do know language expressible in PEG but not in CFGs). >>> >>> Note that, even if PEG does indeed include CFG, that does not imply the >>> existance of an algorithm to convert from one to the other. >>> >>> - Tackling the problem of ambiguity formally, >>> "Disambiguating Grammars with Tree Automata" >>> https://michaeldadams.org/papers/disambiguating-grammars/ >>> >>> - GLL parsing probable deserves a mention for mention for taking a >>> LL-style >>> machinery to the point of finally parsing fully general CFGs through >>> the use >>> of Tomita-style graph structured stacks >>> http://www.cs.rhul.ac.uk/research/languages/csle/GLLparsers.html >>> >>> - Similarly, and while I'm not a fan of the way this work is marketed, >>> parsing >>> using the Brzozowski derivative is certainly an interesting direction >>> in >>> parsing: http://matt.might.net/articles/parsing-with-derivatives/ >>> >>> - This might be outside the scope of your timeline, but there is also a >>> whole >>> lot of work (some quite old) on "semi-parsing", i.e. fuzzy or partial >>> parsing: >>> http://grammarware.net/text/2014/semiparsing.pdf >>> >>> --- >>> >>> There you have it. Let me know what you think, and I'm looking forward to >>> talking with you! I hope I can help the project in any capacity that I'm >>> able. >>> >>> Cheers, >>> >>> Nicolas LAURENT >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "marpa parser" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
