Re: Parsing Timeline Suggestions

Jeffrey Kegler Mon, 01 Apr 2019 07:04:56 -0700

@Nicholas: I've posted the 2nd part of my reply to my blog: "Sherlock
Holmes and the Case of the Missing Parsing Solution"
<http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2019/03/methodology.html>
.


Many of your questions involved the timeline's methodology and I felt I had
to clarify myself on that issue first.  Hopefully this post does that.

I still owe you answers to some of the specific questions.



On Wed, Feb 27, 2019 at 5:12 PM <[email protected]> wrote:

> Thanks for your answer!
>
> I do mostly agree, PEG is not a cure-all. But to make my opinion clear
> (and I do
> like to think it's an informed opinion rooted in experience) neither is it
> significantly worse in usability than CFGs.
>
> I must admit I missed that note in the footnotes. It doesn't do much for
> me in
> terms of nuance, honestly. It's in a reference after a phrase that reads
> "PEG,
> in safe use, would essentially be LL(1)". If that's the part of saying
> that "PEG
> has safe uses" then it isn't nuance, but rather a judgement that says that
> PEG
> is either dangerous or crippled (as LL(1) is).
>
> It all comes back to this idea that PEG is "dangerous". I'm open to
> discussing
> that — what I currently think is that it really means that it is
> **unintuitive**
> to some people (as I do explain in my answer to Leszek above).
>
> One of the (multiple) things that make me believe this is precisely this
> reference you cited in support of "PEG, in safe use, would essentially be
> LL(1)". I think this is a gross misrepresentation of the claims of the
> paper.
> I'd stake 99% confidence that if you'd ask the authors if the paper says or
> shows that, they'd reply with an emphatic "no".
>
> Consider the abstract:
>
> > Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs)have
> > several similarities and a few differences in both their syntax and
> semantics,
> > but they are usually presented through formalisms that hinder a proper
> > comparison. In this paper we present a new formalism for CFGs that
> highlights
> > the similarities and differences between them. The new formalism borrows
> from
> > PEGs the use of parsing expressions and the recognition-based semantics.
> We
> > show how one way of removing non-determinism from this formalism yields a
> > formalism with the semantics of PEGs. We also prove, based on these new
> > formalisms, how LL(1) grammars define the same language whether
> interpreted as
> > CFGs or as PEGs, and also show how strong-LL(k), right-linear, and
> LL-regular
> > grammars have simple language-preserving translations from CFGs to PEGs.
> Once
> > these classes of CFGs can be automatically translated to equivalent
> PEGs, we
> > can reuse classic top-down grammars in PEG-based tools.
>
> This essentially says that LL(1) grammars are the subset of both PEG and
> CFG
> that share the same behaviour.
>
> Hence your definition of "safe" seems to be "behaving similarly to CFGs".
>
> But like I said, and you said in turn, everything is entitled to their
> own opinions.
>
> I do however think that if a sizeable portion of the parsing community
> would
> agree that PEG are valuable, it might be worthwhile to point out that your
> opinion is in fact, an opinion that does is not consensual by any stretch.
>
> Similarly, why not mention some work building on PEG? In particular
> left-recursion and packrat parsing. Even if you do think they build on a
> flawed
> base, certainly we can agree these results are very significant for those
> of us
> who do use PEG?
>
> About quips, while I don't mind them in general, they seem to me here to be
> precisely an example of the "rope-pulling" that you decry. Wouldn't it be
> more
> high and mighty to simply list facts and publications and let people form
> their
> own opinion?
>
> Now, why do I care? I don't mind much about people being "wrong on the
> internet"
> (™) — that ship sailed long ago — but the idea of a parsing timeline is
> exciting to me. I'd love to recommend it and advertise it, but I find the
> bias
> towards the end prevents me to do so. I think many people might share this
> opinion.
>
> The way I tend to perceive "timelines" and other "histories" is as somewhat
> authoritative and unbiased — at least, that's the platonic ideal. Think
> about
> the standard that Wikipedia tries to enforce for instance. Wikipedia does
> not
> only list consensus — opinions and disputes are relevant, after all — but
> clearly does mark them as such.
>
> Do you think there is a way for the timeline to move in that direction?
>
> On Wednesday, February 20, 2019 at 1:53:31 AM UTC+1, Jeffrey Kegler wrote:
>>
>> Thanks for your kind words.  In this, I'll just respond to the PEG points.
>>
>> That part of my timeline has caused more blowback than any other part,
>> not unexpectedly.  The quips are aimed, not at people like yourself who,
>> while PEG optimists, know its difficulties and do not mislead others.
>> Unfortunately, at the time I wrote that version of the timeline (and still
>> I suspect) you are the minority of pro-PEG voices out on the Web.
>>
>> Many presentations of PEG present it as a "throw anything at me"
>> solution.  These less careful voices were the loudest and cause a lot of
>> folks to waste time pushing a rope.  My quips were aimed to catch the
>> attention of these folks and they did their job.
>>
>> I've done this before, with yacc, for the same reason, and caught the
>> same kind of heat.  People routinely were recommending yacc for "serious
>> parsing" either in ignorance or reckless disregard of the actual experience
>> with it.  So I did a "bovicide" series of blog posts.  Nowadays I don't
>> diss yacc, because it's no longer the danger to parsing newbies that it was.
>>
>> If you read my entry carefully, and get beyond the quips (as I am sure
>> you did), you see that I *do* say there are safe uses of PEG, and in the
>> footnote concede that my description in the text is "negative".  I go on to
>> point the reader toward the pro-PEG literature -- a literature of which the
>> less careful PEG advocates are usually unaware.
>>
>> So to my careless reader, a warning about PEG gets across, and to my
>> careful reader, I point out that, while I am more optimistic about
>> Earley/Leo, there is safe use of PEG and a literature worth consulting.
>>
>> In fact, I think there are 5 "forever" algorithms, parsing algorihms that
>> are of permanent significance, and PEG is one of them.  (The other 4 are
>> regular expressions, recursive descent, Earley/Leo and Sakai's algorihm,
>> aka CYK).
>>
>> Getting to the merits, I've heard before that some find the PEG syntax
>> more intuitive than BNF.  Everyone is the ultimate authority on their own
>> intuitions, so I won't dispute this.  And you are more optimistic about
>> research into PEG than I am, clearly.  But as Yogi Berra says, "predictions
>> are difficult, especially about the future".  My timeline does not engage
>> in predictions, although it certainly reflects my optimism wrt Earley/Leo.
>> I do claim to be entitled to greater confidence in my predictions on the
>> grounds, at 65, I am much less likely to be around if they fall flat. :-)
>>
>> I'll tackle the points about why the timeline includes/excludes what it
>> does separately.
>>
>> On Tue, Feb 19, 2019 at 5:47 AM <[email protected]> wrote:
>>
>>> (Following up from
>>> https://twitter.com/jeffreykegler/status/1097628806091816960)
>>>
>>> Dear Jeffrey,
>>>
>>> I was delighted when I found about your parsing timeline a couple of
>>> months ago.
>>> I'm a PhD student researching parsing myself and it's great to see
>>> resources
>>> like this bringing awareness to what happened in the field, and it's a
>>> great
>>> reference for people in the field as well. It also got me more
>>> interested in
>>> Earley and your own work of it, which is quite cool.
>>>
>>> That being said, I do have some insatisfactions about the timeline,
>>> especially
>>> the end of it. It's always easy to criticize, so I would like to offer
>>> my help
>>> instead, to see if I can contribute something useful to the timeline.
>>>
>>> According to me, there are two points that could use improvement. First,
>>> I feel
>>> top-down recursive-descent formalizations, especially PEG, are being
>>> unfairly
>>> dismissed. Second, it seems to me like there are quite a few new
>>> developments
>>> that would deserve inclusion at the end of the timeline (especially by
>>> comparison to the level of details towards the start of the timeline).
>>>
>>> I don't pretend to have the objective truth, but the goal is to have a
>>> productive conversation that can hopefully lead to improved accuracy.
>>>
>>> ---
>>>
>>> Regarding PEG, I find the whole description minus the first paragraph to
>>> be
>>> rather contestable and (I feel) uncharitable.
>>>
>>> It's entirely correct that PEG does not cover BNF. But BNF is
>>> essentially a
>>> notation for the CFG formalism, which PEG is not.
>>>
>>> It seems your main beef with PEG is the problem that has been described
>>> as
>>> "language hiding" or "prefix capture": the fact that a rule like `A ::=
>>> a | aa`
>>> will not match the input "aa".
>>>
>>> This is indeed an important problem of PEGs. But the way I see it, it's
>>> a dual
>>> to the problem of ambiguity in CFGs. Both preclude one another: PEGs are
>>> unambiguous but have prefix capture, CFGs do not have prefix capture but
>>> are
>>> ambiguous. The detection of both issue is provably intractable
>>> (statically) and
>>> may be more related than we know, as a relatively similar machinery can
>>> be put
>>> to work to do partial detection of both ambiguity in CFGs and prefix
>>> capture in
>>> PEGs, cf. "Modular Syntax Demands Verification" by Sylvain Schmitz (*).
>>>
>>> (*) http://www.i3s.unice.fr/~mh/RR/2006/RR-06.32-S.SCHMITZ.pdf
>>>
>>> I have written a fair deal of both PEGs and CFGs and both seem about
>>> equally
>>> difficult to write. Non-general CFG formalism (LALR, LL) are much
>>> harder, and so
>>> are PEG tools that have no support to build left-associative parse trees.
>>>
>>> I also would never advise anyone not to test a grammar, whatever the
>>> formalism.
>>> I've certainly never written a big chunk of grammar without making a
>>> mistake
>>> whatever the formalism. And in practice, the overwhelming majority of
>>> these
>>> mistakes weren't about either prefix capture nor ambiguity.
>>>
>>> If your practical experience differs, I'd be interested to hear about it.
>>>
>>> ---
>>>
>>> Regarding new developments, here are a couple of ideas of the top of my
>>> head.
>>> The first two items of the list are not random, since those were things
>>> I worked
>>> on in my research (http://norswap.com/publications/)
>>>
>>> - Left-recursion and left-associativity handling in PEG, since it fixes
>>> the
>>>   foremost problem (in my opinion) with PEGs
>>>
>>> - Context-sensitive parsing
>>>
>>>   You already already mention monadic parsing (but do not emphasize its
>>>   context-sensitive properties). Another oldie that deserves to be
>>> mentionned
>>>   are Prolog-style definite clause grammars (DCGs).
>>>
>>>   There are some other works worth mentioning if this is a topic of
>>> interest for
>>>   you.
>>>
>>>   My own work in the area is essentially pushing an idea which expressed
>>> in a
>>>   lovely manner:
>>>
>>>   > Recursive descent does have a huge advantage, one which, despite its
>>> severe
>>>   > limitations, will save it from obsolescence time and again.
>>> Hand-written
>>>   > recursive descent is essentially calling subroutines. Adding custom
>>>   > modification to recursive descent is very straight-forward.
>>>
>>>   But making it so that these subroutines may manipulate state in a way
>>> that is
>>>   safe in the presence of backtracking, enabling context-sensitive
>>> parsing.
>>>
>>> - Packrat parsing, if only because it enables O(1) parsing of PEG, a
>>> class of
>>>   languages that may still strictly include the whole of CFG — as far as
>>> I know,
>>>   we do not know of a simple language that can be expressed in CFG but
>>> not in
>>>   PEG (we do know language expressible in PEG but not in CFGs).
>>>
>>>   Note that, even if PEG does indeed include CFG, that does not imply the
>>>   existance of an algorithm to convert from one to the other.
>>>
>>> - Tackling the problem of ambiguity formally,
>>>   "Disambiguating Grammars with Tree Automata"
>>>   https://michaeldadams.org/papers/disambiguating-grammars/
>>>
>>> - GLL parsing probable deserves a mention for mention for taking a
>>> LL-style
>>>   machinery to the point of finally parsing fully general CFGs through
>>> the use
>>>   of Tomita-style graph structured stacks
>>>   http://www.cs.rhul.ac.uk/research/languages/csle/GLLparsers.html
>>>
>>> - Similarly, and while I'm not a fan of the way this work is marketed,
>>> parsing
>>>   using the Brzozowski derivative is certainly an interesting direction
>>> in
>>>   parsing: http://matt.might.net/articles/parsing-with-derivatives/
>>>
>>> - This might be outside the scope of your timeline, but there is also a
>>> whole
>>>   lot of work (some quite old) on "semi-parsing", i.e. fuzzy or partial
>>> parsing:
>>>   http://grammarware.net/text/2014/semiparsing.pdf
>>>
>>> ---
>>>
>>> There you have it. Let me know what you think, and I'm looking forward to
>>> talking with you! I hope I can help the project in any capacity that I'm
>>> able.
>>>
>>> Cheers,
>>>
>>> Nicolas LAURENT
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "marpa parser" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Parsing Timeline Suggestions

Reply via email to