Re: [fonc] Earley Parsing Explained (incomplete first draft)

Loup Vaillant-David Sat, 20 Sep 2014 05:28:59 -0700

On Sat, Sep 20, 2014 at 06:58:14AM -0400, Josh Grams wrote:
> How's that for coincidence? I had just finally (on the 18th) got
> around to watching Ian's "Trap a Better Mouse" talk and starting
> to try it myself, and then saw that you posted this. I've done
> some parsing before, so you haven't (yet) covered any of the parts
> that I had trouble understanding. I *think* I've figured them out.
> Now I have to finish implementing it and see if I'm right.


In any case, they should come soon.  First I'll finish the recogniser
(I have to correct the empty rules bug using Aycock & Horsepool
(that's easy), and optimise right recursive rules using Leo (haven't
figured that out yet).


> Mainly the recognizer. Did you figure that out? I think the
> important insight is that matches added by scanning and completion
> operations represent a (partial) derivation step.

Yes, that was a big "click" moment for me.  Seeing things like
simplify the whole model, but it's not obvious.  That's why I included
a page on chart parsing.


> So your left back-pointer gives the history of the rule that was
> advanced, and your right back-pointer gives a rule which was applied
> just before the cursor (but only if it points to a complete match).

Actually, you don't need the back pointers.  Plain Earley items are
enough.  Even better, you don't need all the items.  You only need the
completed ones.


> So then you should just be able to do a depth-first traversal of
> that tree?  Start with matches for the start symbol at the first
> position in the final Earley state. For a left-most derivation,
> follow the left back-pointer first.

This business with back pointers is needlessly complicated, I found.
Plus, the way Earley items are organised, the search begins from the
end of the rules, not from their beginning.  This causes problems with
disambiguation.

The easiest strategy is the following: first, ignore any Earley item
that's not completed.  Second, flip the completed states.

Flipping a completed state is easy:  For each item in S(i) of the form
A -> α • (j), remove the item and place A -> α • (i) in S(j).  You
basically flip the start and end positions: that way, instead of
storing the partial parses where they end, you store them where they
begin.

From there, you can perform a search.  Since all the items are
completed, you *know* there is a reduction for whatever partial parse
you are examining.


> If you want the prioritised-choice version, then you need to make
> a choice each time you have multiple pairs of back-pointers. Look
> at the right back-pointer of each and choose the one with the rule
> earliest in the priority chain.

Again, no need for back pointers.  But I haven't completely solved the
prioritised choice.  See, there are two kinds of ambiguity.  The
first:

   A -> α • (i)
   A -> β • (i)

Is easily solved through top-down prioritised choice (just take
whatever rule comes first).  But there's another, more annoying
ambiguity:

   A -> α • (i)
   A -> α • (j)

Sometimes, the same rule can span different lengths.  That's typical
of repetition rules.  So you have two choices:

1. Just apply the longest match rule.
2. Dig both rules in parallel, search for the "topmost, first"
   difference (whatever that means), and apply prioritised choice
   there.

Choice 1 is easy, but choice 2 gives more control to the end user.  I
have yet to decide which is best in practice.  (Besides, I'm not sure
how to implement choice 2 in the first place.  It probably involves
some form of breadth first search.)


> I've been doing mine in Javascript and keeping an annotated
> version as HTML -- sort of a literate program. So when I get it
> finished, maybe I'll post that as well...we'll see how it turns
> out.

Even if our works are redundant, it will be worth it: first it help to
see things from different angles.  Second, some people may prefer one
explanation over the other.  Third, if we steal from each other, we're
likely to come up with something better than either of us could have
done alone.

Whatever you do, I'm interested.

Loup.
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Earley Parsing Explained (incomplete first draft)

Reply via email to