Re: [fonc] Earley Parsing Explained (incomplete first draft)

Loup Vaillant-David Sat, 20 Sep 2014 11:00:22 -0700

On Sat, Sep 20, 2014 at 11:57:58AM -0400, Josh Grams wrote:
> On 2014-09-20 02:27PM, Loup Vaillant-David wrote:
> >Actually, you don't need the back pointers.  Plain Earley items are
> >enough.  Even better, you don't need all the items.  You only need the
> >completed ones.
> 
> Sure, it's just a classic space/time tradeoff.


Not exactly.  My current guess is that not using back pointer yields
simpler code.  Reconstruction is simpler than back pointers
management.  Performance (memory or CPU) is secondary.


> >This business with back pointers is needlessly complicated, I found.
> 
> Hmm...did you try drawing out where they point to?

I did.  It gave me much insight about the structure of ambiguity.  And
I eventually found the main problem with back pointers: they point
*back*.

By default, Earley recognisers store their Earley items in such a way
that reconstruction (or back pointer following) happens from right to
left.  Which means the first ambiguities will be detected at the *end*
of the parse.

In parsing expression grammars, it's very intuitive.  Imagine you
parse this rule:

  A -> B C D

So you parse B, then C, then D (they are written in that order, after
all).  If you detect an ambiguity in B (or anything below it),
prioritised choice will take care of it.  Now imagine there are two
ways to parse the following input:

   _B_     _C_     _D_    -- first  possibility
  /   \   /   \   /   \
  a   b   c   d   e   f   -- input
  \_______/   |   \___/
      B       C     D     -- second possibility

The parsing expression grammar will first solve the ambiguity at the B
level.  Once B is parsed, there is only one possibility.  Now with
Earley items, since you start from the end, you will first deal with
D, then with C—

Aha! there's an ambiguity!  So you solve it at C's level, leaving only
one possibility for B.  So, instead of doing the prioritised choice at
B, where a user would expect it, you do it at C, quite possibly
yielding a different choice.

---

So I knew I had to do something to start from the beginning, instead of
from the end.

With the back pointers, it meant transforming the back pointers into
forward pointers.  Say you have Three items, A, B and C.  B has a back
pointer to A, and that back pointer has a down pointer to C.  You need
to add a forward pointer to A that point to B.  And that forward
pointer needs a down pointer that points to C.  So you go from:

  A <--+--- B
       V
       C
To

  A ---+--> B
       V
       C

And of course, those pointers are not unique, so you need to maintain
a *list* of those…  So I just threw up my arms, and tried another
approach.  I thought for a moment that we *needed* those back
pointers.  But then I have read that paper about chart parsing, and
realise you could do everything with completed items alone.

This has 2 advantages:

  - Completed Earley items are simpler than the others, and very easy
    to reverse.  (Unlike the back pointer stuff.)
  - No need to modify the recogniser to have your parser.  They are
    completely independent, making them easier to learn.

---

TL;DR: ignore Elizabeth Scott's "SPPF-style Parsing for Earley
Recognisers".  It's not useful when you only want to select one parse
tree.

Loup.
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Earley Parsing Explained (incomplete first draft)

Reply via email to