Re: [fonc] Earley Parsing Explained (incomplete first draft)
On 2014-09-20 07:59PM, Loup Vaillant-David wrote: By default, Earley recognisers store their Earley items in such a way that reconstruction (or back pointer following) happens from right to left. Which means the first ambiguities will be detected at the *end* of the parse. But a depth-first traversal turns that around. In parsing expression grammars, it's very intuitive. Imagine you parse this rule: A - B C D So you parse B, then C, then D (they are written in that order, after all). If you detect an ambiguity in B (or anything below it), prioritised choice will take care of it. Now imagine there are two ways to parse the following input: _B_ _C_ _D_-- first possibility / \ / \ / \ a b c d e f -- input \___/ | \___/ B C D -- second possibility The left back pointers move back through the rule. So a pre-order traversal will go like this (I'm using tildes because it was annoying to copy/paste the bullet): A - B C D ~ -- at this step you apply A = B C D. A - B C ~ D A - B ~ C D A - ~ B C D No more left pointers, so return back up to A - B ~ C D. Follow its right pointer to B ... ambiguous! So choose between: B - a b ~ B - a b c ~ Then return back up to A - B C ~ D and follow its right pointer to deal with C, etc. --Josh ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Earley Parsing Explained (incomplete first draft)
On 2014-09-20 06:58AM, Josh Grams wrote: Mainly the recognizer. Did you figure that out? I think the important insight is that matches added by scanning and completion operations represent a (partial) derivation step. Gah. Not scanning. What am I saying? Only completion. The completion operation represents a rule being applied all the places where it presently fits. Yes? So if we save those pointers we can just hook them up starting from the root(s) to get the parse forest? Sorry, I've only spent three or four hours looking at Earley parsing: I'm not all the way up to speed yet. --Josh ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Earley Parsing Explained (incomplete first draft)
On Sat, Sep 20, 2014 at 11:57:58AM -0400, Josh Grams wrote: On 2014-09-20 02:27PM, Loup Vaillant-David wrote: Actually, you don't need the back pointers. Plain Earley items are enough. Even better, you don't need all the items. You only need the completed ones. Sure, it's just a classic space/time tradeoff. Not exactly. My current guess is that not using back pointer yields simpler code. Reconstruction is simpler than back pointers management. Performance (memory or CPU) is secondary. This business with back pointers is needlessly complicated, I found. Hmm...did you try drawing out where they point to? I did. It gave me much insight about the structure of ambiguity. And I eventually found the main problem with back pointers: they point *back*. By default, Earley recognisers store their Earley items in such a way that reconstruction (or back pointer following) happens from right to left. Which means the first ambiguities will be detected at the *end* of the parse. In parsing expression grammars, it's very intuitive. Imagine you parse this rule: A - B C D So you parse B, then C, then D (they are written in that order, after all). If you detect an ambiguity in B (or anything below it), prioritised choice will take care of it. Now imagine there are two ways to parse the following input: _B_ _C_ _D_-- first possibility / \ / \ / \ a b c d e f -- input \___/ | \___/ B C D -- second possibility The parsing expression grammar will first solve the ambiguity at the B level. Once B is parsed, there is only one possibility. Now with Earley items, since you start from the end, you will first deal with D, then with C— Aha! there's an ambiguity! So you solve it at C's level, leaving only one possibility for B. So, instead of doing the prioritised choice at B, where a user would expect it, you do it at C, quite possibly yielding a different choice. --- So I knew I had to do something to start from the beginning, instead of from the end. With the back pointers, it meant transforming the back pointers into forward pointers. Say you have Three items, A, B and C. B has a back pointer to A, and that back pointer has a down pointer to C. You need to add a forward pointer to A that point to B. And that forward pointer needs a down pointer that points to C. So you go from: A --+--- B V C To A ---+-- B V C And of course, those pointers are not unique, so you need to maintain a *list* of those… So I just threw up my arms, and tried another approach. I thought for a moment that we *needed* those back pointers. But then I have read that paper about chart parsing, and realise you could do everything with completed items alone. This has 2 advantages: - Completed Earley items are simpler than the others, and very easy to reverse. (Unlike the back pointer stuff.) - No need to modify the recogniser to have your parser. They are completely independent, making them easier to learn. --- TL;DR: ignore Elizabeth Scott's SPPF-style Parsing for Earley Recognisers. It's not useful when you only want to select one parse tree. Loup. ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
[fonc] Earley Parsing Explained (incomplete first draft)
Hi, After spending months banging my head over Earley Parsing, I have decided to write a tutorial. Ian once said Earley parsing is simple and easy to implement. I agree with simple, but not with easy. The required background knowledge is not trivial. This tutorial is an attempt to gather this knowledge in one place. Nothing fancy, no deep math, no proof of correctness. Just a (hopefully) intuitive explanation of the concepts, needed to implement Earley parsing. The goal is to help competent programmers who know little about parsing to write their own Earley parsing framework. So far, I have done most of the recogniser. The following pages are done, and up for review: http://loup-vaillant.fr/tutorials/earley-parsing/ http://loup-vaillant.fr/tutorials/earley-parsing/what-and-why http://loup-vaillant.fr/tutorials/earley-parsing/chart-parsing http://loup-vaillant.fr/tutorials/earley-parsing/recogniser Questions and criticisms are most welcome. I'd like to know about any factual inaccuracy, poor wording, confusing explanation… please don't hesitate to question anything, even the structure of this tutorial. Enjoy, and thanks, Loup. ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Earley Parsing Explained (incomplete first draft)
On 19 Sep 2014, at 7:11 am, Loup Vaillant-David l...@loup-vaillant.fr wrote: Hi, After spending months banging my head over Earley Parsing, I have decided to write a tutorial. Ian once said Earley parsing is simple and easy to implement. I agree with simple, but not with easy. The required background knowledge is not trivial. Easy is pretty subjective! What's easy to my a 6 year old is not going to be easy for a 50 year old, and vice versa. Perhaps you mean approachable to someone of average intellect and cultural heritage? This tutorial is an attempt to gather this knowledge in one place. Nothing fancy, no deep math, no proof of correctness. Just a (hopefully) intuitive explanation of the concepts, needed to implement Earley parsing. The goal is to help competent programmers who know little about parsing to write their own Earley parsing framework. So far, I have done most of the recogniser. The following pages are done, and up for review: http://loup-vaillant.fr/tutorials/earley-parsing/ http://loup-vaillant.fr/tutorials/earley-parsing/what-and-why http://loup-vaillant.fr/tutorials/earley-parsing/chart-parsing http://loup-vaillant.fr/tutorials/earley-parsing/recogniser Questions and criticisms are most welcome. I'd like to know about any factual inaccuracy, poor wording, confusing explanation… please don't hesitate to question anything, even the structure of this tutorial. Enjoy, and thanks, Loup. I really like your writing style! One criticism I'd have is there aren't any pointers to people who don't have programming language construction backgrounds. So, perhaps adding a pre-requisite reading list at the beginning aimed at bringing someone who has finished high school and nothing else up to speed... This is perhaps my general criticism of most technical work. The assumed knowledge is high, and it shouldn't be. As more and more people get involved in computing, this means there can be less and less assumed backgrounds. I wrote about good code and by extension good writing a while back here: http://www.genericoverlords.com/good_coding Julian http://www.getcontented.com.au/ - You Need GetContented - Get Your Website Happy. :)___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc