Re: Parsing Timeline Suggestions

Jeffrey Kegler Fri, 29 Mar 2019 16:11:53 -0700

@Nicholas: Just to confirm that I have not forgotten and that I *do* intend 
to reply to the other points in your post.  But since you're not the only 
one who raises some of these questions, I thought it worthwhile to do a 
careful, well-researched job of it, so I've been doing a bit of reading.  
Not too much longer, I hope.


Thanks, jeffrey

On Wednesday, February 27, 2019 at 10:12:52 PM UTC-5, Jeffrey Kegler wrote:
>
> Your PEG work should be interesting and I plan to follow it.  You made 
> quite a number of other significant (non-PEG) points and I am aware that I 
> still owe you responses to them.  That's underway.  It's taking some time 
> because I am doing some reading for it.
>
> best, jeffrey
>
> On Wed, Feb 27, 2019 at 5:12 PM <[email protected]> wrote:
>
>> Thanks for your answer!
>>
>> I do mostly agree, PEG is not a cure-all. But to make my opinion clear 
>> (and I do
>> like to think it's an informed opinion rooted in experience) neither is it
>> significantly worse in usability than CFGs.
>>
>> I must admit I missed that note in the footnotes. It doesn't do much for 
>> me in
>> terms of nuance, honestly. It's in a reference after a phrase that reads 
>> "PEG,
>> in safe use, would essentially be LL(1)". If that's the part of saying 
>> that "PEG
>> has safe uses" then it isn't nuance, but rather a judgement that says 
>> that PEG
>> is either dangerous or crippled (as LL(1) is).
>>
>> It all comes back to this idea that PEG is "dangerous". I'm open to 
>> discussing
>> that — what I currently think is that it really means that it is 
>> **unintuitive**
>> to some people (as I do explain in my answer to Leszek above).
>>
>> One of the (multiple) things that make me believe this is precisely this
>> reference you cited in support of "PEG, in safe use, would essentially be
>> LL(1)". I think this is a gross misrepresentation of the claims of the 
>> paper.
>> I'd stake 99% confidence that if you'd ask the authors if the paper says 
>> or
>> shows that, they'd reply with an emphatic "no".
>>
>> Consider the abstract:
>>
>> > Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs)have
>> > several similarities and a few differences in both their syntax and 
>> semantics,
>> > but they are usually presented through formalisms that hinder a proper
>> > comparison. In this paper we present a new formalism for CFGs that 
>> highlights
>> > the similarities and differences between them. The new formalism 
>> borrows from
>> > PEGs the use of parsing expressions and the recognition-based 
>> semantics. We
>> > show how one way of removing non-determinism from this formalism yields 
>> a
>> > formalism with the semantics of PEGs. We also prove, based on these new
>> > formalisms, how LL(1) grammars define the same language whether 
>> interpreted as
>> > CFGs or as PEGs, and also show how strong-LL(k), right-linear, and 
>> LL-regular
>> > grammars have simple language-preserving translations from CFGs to 
>> PEGs. Once
>> > these classes of CFGs can be automatically translated to equivalent 
>> PEGs, we
>> > can reuse classic top-down grammars in PEG-based tools.
>>
>> This essentially says that LL(1) grammars are the subset of both PEG and 
>> CFG
>> that share the same behaviour.
>>
>> Hence your definition of "safe" seems to be "behaving similarly to CFGs".
>>
>> But like I said, and you said in turn, everything is entitled to their
>> own opinions.
>>
>> I do however think that if a sizeable portion of the parsing community 
>> would
>> agree that PEG are valuable, it might be worthwhile to point out that your
>> opinion is in fact, an opinion that does is not consensual by any stretch.
>>
>> Similarly, why not mention some work building on PEG? In particular
>> left-recursion and packrat parsing. Even if you do think they build on a 
>> flawed
>> base, certainly we can agree these results are very significant for those 
>> of us
>> who do use PEG?
>>
>> About quips, while I don't mind them in general, they seem to me here to 
>> be
>> precisely an example of the "rope-pulling" that you decry. Wouldn't it be 
>> more
>> high and mighty to simply list facts and publications and let people form 
>> their
>> own opinion?
>>
>> Now, why do I care? I don't mind much about people being "wrong on the 
>> internet"
>> (™) — that ship sailed long ago — but the idea of a parsing timeline is
>> exciting to me. I'd love to recommend it and advertise it, but I find the 
>> bias
>> towards the end prevents me to do so. I think many people might share this
>> opinion.
>>
>> The way I tend to perceive "timelines" and other "histories" is as 
>> somewhat
>> authoritative and unbiased — at least, that's the platonic ideal. Think 
>> about
>> the standard that Wikipedia tries to enforce for instance. Wikipedia does 
>> not
>> only list consensus — opinions and disputes are relevant, after all — but
>> clearly does mark them as such.
>>
>> Do you think there is a way for the timeline to move in that direction?
>>
>> On Wednesday, February 20, 2019 at 1:53:31 AM UTC+1, Jeffrey Kegler wrote:
>>>
>>> Thanks for your kind words.  In this, I'll just respond to the PEG 
>>> points.
>>>
>>> That part of my timeline has caused more blowback than any other part, 
>>> not unexpectedly.  The quips are aimed, not at people like yourself who, 
>>> while PEG optimists, know its difficulties and do not mislead others.  
>>> Unfortunately, at the time I wrote that version of the timeline (and still 
>>> I suspect) you are the minority of pro-PEG voices out on the Web.
>>>
>>> Many presentations of PEG present it as a "throw anything at me" 
>>> solution.  These less careful voices were the loudest and cause a lot of 
>>> folks to waste time pushing a rope.  My quips were aimed to catch the 
>>> attention of these folks and they did their job.
>>>
>>> I've done this before, with yacc, for the same reason, and caught the 
>>> same kind of heat.  People routinely were recommending yacc for "serious 
>>> parsing" either in ignorance or reckless disregard of the actual experience 
>>> with it.  So I did a "bovicide" series of blog posts.  Nowadays I don't 
>>> diss yacc, because it's no longer the danger to parsing newbies that it was.
>>>
>>> If you read my entry carefully, and get beyond the quips (as I am sure 
>>> you did), you see that I *do* say there are safe uses of PEG, and in the 
>>> footnote concede that my description in the text is "negative".  I go on to 
>>> point the reader toward the pro-PEG literature -- a literature of which the 
>>> less careful PEG advocates are usually unaware.
>>>
>>> So to my careless reader, a warning about PEG gets across, and to my 
>>> careful reader, I point out that, while I am more optimistic about 
>>> Earley/Leo, there is safe use of PEG and a literature worth consulting.
>>>
>>> In fact, I think there are 5 "forever" algorithms, parsing algorihms 
>>> that are of permanent significance, and PEG is one of them.  (The other 4 
>>> are regular expressions, recursive descent, Earley/Leo and Sakai's 
>>> algorihm, aka CYK).
>>>
>>> Getting to the merits, I've heard before that some find the PEG syntax 
>>> more intuitive than BNF.  Everyone is the ultimate authority on their own 
>>> intuitions, so I won't dispute this.  And you are more optimistic about 
>>> research into PEG than I am, clearly.  But as Yogi Berra says, "predictions 
>>> are difficult, especially about the future".  My timeline does not engage 
>>> in predictions, although it certainly reflects my optimism wrt Earley/Leo.  
>>> I do claim to be entitled to greater confidence in my predictions on the 
>>> grounds, at 65, I am much less likely to be around if they fall flat. :-)
>>>
>>> I'll tackle the points about why the timeline includes/excludes what it 
>>> does separately.
>>>
>>> On Tue, Feb 19, 2019 at 5:47 AM <[email protected]> wrote:
>>>
>>>> (Following up from 
>>>> https://twitter.com/jeffreykegler/status/1097628806091816960)
>>>>
>>>> Dear Jeffrey,
>>>>
>>>> I was delighted when I found about your parsing timeline a couple of 
>>>> months ago.
>>>> I'm a PhD student researching parsing myself and it's great to see 
>>>> resources
>>>> like this bringing awareness to what happened in the field, and it's a 
>>>> great
>>>> reference for people in the field as well. It also got me more 
>>>> interested in
>>>> Earley and your own work of it, which is quite cool.
>>>>
>>>> That being said, I do have some insatisfactions about the timeline, 
>>>> especially
>>>> the end of it. It's always easy to criticize, so I would like to offer 
>>>> my help
>>>> instead, to see if I can contribute something useful to the timeline.
>>>>
>>>> According to me, there are two points that could use improvement. 
>>>> First, I feel
>>>> top-down recursive-descent formalizations, especially PEG, are being 
>>>> unfairly
>>>> dismissed. Second, it seems to me like there are quite a few new 
>>>> developments
>>>> that would deserve inclusion at the end of the timeline (especially by
>>>> comparison to the level of details towards the start of the timeline).
>>>>
>>>> I don't pretend to have the objective truth, but the goal is to have a
>>>> productive conversation that can hopefully lead to improved accuracy.
>>>>
>>>> ---
>>>>
>>>> Regarding PEG, I find the whole description minus the first paragraph 
>>>> to be
>>>> rather contestable and (I feel) uncharitable.
>>>>
>>>> It's entirely correct that PEG does not cover BNF. But BNF is 
>>>> essentially a
>>>> notation for the CFG formalism, which PEG is not.
>>>>
>>>> It seems your main beef with PEG is the problem that has been described 
>>>> as
>>>> "language hiding" or "prefix capture": the fact that a rule like `A ::= 
>>>> a | aa`
>>>> will not match the input "aa".
>>>>
>>>> This is indeed an important problem of PEGs. But the way I see it, it's 
>>>> a dual
>>>> to the problem of ambiguity in CFGs. Both preclude one another: PEGs are
>>>> unambiguous but have prefix capture, CFGs do not have prefix capture 
>>>> but are
>>>> ambiguous. The detection of both issue is provably intractable 
>>>> (statically) and
>>>> may be more related than we know, as a relatively similar machinery can 
>>>> be put
>>>> to work to do partial detection of both ambiguity in CFGs and prefix 
>>>> capture in
>>>> PEGs, cf. "Modular Syntax Demands Verification" by Sylvain Schmitz (*).
>>>>
>>>> (*) http://www.i3s.unice.fr/~mh/RR/2006/RR-06.32-S.SCHMITZ.pdf
>>>>
>>>> I have written a fair deal of both PEGs and CFGs and both seem about 
>>>> equally
>>>> difficult to write. Non-general CFG formalism (LALR, LL) are much 
>>>> harder, and so
>>>> are PEG tools that have no support to build left-associative parse 
>>>> trees.
>>>>
>>>> I also would never advise anyone not to test a grammar, whatever the 
>>>> formalism.
>>>> I've certainly never written a big chunk of grammar without making a 
>>>> mistake
>>>> whatever the formalism. And in practice, the overwhelming majority of 
>>>> these
>>>> mistakes weren't about either prefix capture nor ambiguity.
>>>>
>>>> If your practical experience differs, I'd be interested to hear about 
>>>> it.
>>>>
>>>> ---
>>>>
>>>> Regarding new developments, here are a couple of ideas of the top of my 
>>>> head.
>>>> The first two items of the list are not random, since those were things 
>>>> I worked
>>>> on in my research (http://norswap.com/publications/)
>>>>
>>>> - Left-recursion and left-associativity handling in PEG, since it fixes 
>>>> the
>>>>   foremost problem (in my opinion) with PEGs
>>>>
>>>> - Context-sensitive parsing
>>>>
>>>>   You already already mention monadic parsing (but do not emphasize its
>>>>   context-sensitive properties). Another oldie that deserves to be 
>>>> mentionned
>>>>   are Prolog-style definite clause grammars (DCGs).
>>>>
>>>>   There are some other works worth mentioning if this is a topic of 
>>>> interest for
>>>>   you.
>>>>
>>>>   My own work in the area is essentially pushing an idea which 
>>>> expressed in a
>>>>   lovely manner:
>>>>
>>>>   > Recursive descent does have a huge advantage, one which, despite 
>>>> its severe
>>>>   > limitations, will save it from obsolescence time and again. 
>>>> Hand-written
>>>>   > recursive descent is essentially calling subroutines. Adding custom
>>>>   > modification to recursive descent is very straight-forward.
>>>>
>>>>   But making it so that these subroutines may manipulate state in a way 
>>>> that is
>>>>   safe in the presence of backtracking, enabling context-sensitive 
>>>> parsing.
>>>>
>>>> - Packrat parsing, if only because it enables O(1) parsing of PEG, a 
>>>> class of
>>>>   languages that may still strictly include the whole of CFG — as far 
>>>> as I know,
>>>>   we do not know of a simple language that can be expressed in CFG but 
>>>> not in
>>>>   PEG (we do know language expressible in PEG but not in CFGs).
>>>>
>>>>   Note that, even if PEG does indeed include CFG, that does not imply 
>>>> the
>>>>   existance of an algorithm to convert from one to the other.
>>>>
>>>> - Tackling the problem of ambiguity formally,
>>>>   "Disambiguating Grammars with Tree Automata"
>>>>   https://michaeldadams.org/papers/disambiguating-grammars/
>>>>
>>>> - GLL parsing probable deserves a mention for mention for taking a 
>>>> LL-style
>>>>   machinery to the point of finally parsing fully general CFGs through 
>>>> the use
>>>>   of Tomita-style graph structured stacks
>>>>   http://www.cs.rhul.ac.uk/research/languages/csle/GLLparsers.html
>>>>
>>>> - Similarly, and while I'm not a fan of the way this work is marketed, 
>>>> parsing
>>>>   using the Brzozowski derivative is certainly an interesting direction 
>>>> in
>>>>   parsing: http://matt.might.net/articles/parsing-with-derivatives/
>>>>
>>>> - This might be outside the scope of your timeline, but there is also a 
>>>> whole
>>>>   lot of work (some quite old) on "semi-parsing", i.e. fuzzy or partial 
>>>> parsing:
>>>>   http://grammarware.net/text/2014/semiparsing.pdf
>>>>
>>>> ---
>>>>
>>>> There you have it. Let me know what you think, and I'm looking forward 
>>>> to
>>>> talking with you! I hope I can help the project in any capacity that 
>>>> I'm able.
>>>>
>>>> Cheers,
>>>>
>>>> Nicolas LAURENT
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "marpa parser" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "marpa parser" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Parsing Timeline Suggestions

Reply via email to