On Sat, Sep 20, 2014 at 11:57:58AM -0400, Josh Grams wrote: > On 2014-09-20 02:27PM, Loup Vaillant-David wrote: > >Actually, you don't need the back pointers. Plain Earley items are > >enough. Even better, you don't need all the items. You only need the > >completed ones. > > Sure, it's just a classic space/time tradeoff.
Not exactly. My current guess is that not using back pointer yields simpler code. Reconstruction is simpler than back pointers management. Performance (memory or CPU) is secondary. > >This business with back pointers is needlessly complicated, I found. > > Hmm...did you try drawing out where they point to? I did. It gave me much insight about the structure of ambiguity. And I eventually found the main problem with back pointers: they point *back*. By default, Earley recognisers store their Earley items in such a way that reconstruction (or back pointer following) happens from right to left. Which means the first ambiguities will be detected at the *end* of the parse. In parsing expression grammars, it's very intuitive. Imagine you parse this rule: A -> B C D So you parse B, then C, then D (they are written in that order, after all). If you detect an ambiguity in B (or anything below it), prioritised choice will take care of it. Now imagine there are two ways to parse the following input: _B_ _C_ _D_ -- first possibility / \ / \ / \ a b c d e f -- input \_______/ | \___/ B C D -- second possibility The parsing expression grammar will first solve the ambiguity at the B level. Once B is parsed, there is only one possibility. Now with Earley items, since you start from the end, you will first deal with D, then with C— Aha! there's an ambiguity! So you solve it at C's level, leaving only one possibility for B. So, instead of doing the prioritised choice at B, where a user would expect it, you do it at C, quite possibly yielding a different choice. --- So I knew I had to do something to start from the beginning, instead of from the end. With the back pointers, it meant transforming the back pointers into forward pointers. Say you have Three items, A, B and C. B has a back pointer to A, and that back pointer has a down pointer to C. You need to add a forward pointer to A that point to B. And that forward pointer needs a down pointer that points to C. So you go from: A <--+--- B V C To A ---+--> B V C And of course, those pointers are not unique, so you need to maintain a *list* of those… So I just threw up my arms, and tried another approach. I thought for a moment that we *needed* those back pointers. But then I have read that paper about chart parsing, and realise you could do everything with completed items alone. This has 2 advantages: - Completed Earley items are simpler than the others, and very easy to reverse. (Unlike the back pointer stuff.) - No need to modify the recogniser to have your parser. They are completely independent, making them easier to learn. --- TL;DR: ignore Elizabeth Scott's "SPPF-style Parsing for Earley Recognisers". It's not useful when you only want to select one parse tree. Loup. _______________________________________________ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc