Re: [fonc] Earley Parsing Explained (incomplete first draft)

2014-09-21 Thread Josh Grams
On 2014-09-20 07:59PM, Loup Vaillant-David wrote:
By default, Earley recognisers store their Earley items in such a way
that reconstruction (or back pointer following) happens from right to
left.  Which means the first ambiguities will be detected at the *end*
of the parse.

But a depth-first traversal turns that around.

In parsing expression grammars, it's very intuitive.  Imagine you
parse this rule:

  A - B C D

So you parse B, then C, then D (they are written in that order, after
all).  If you detect an ambiguity in B (or anything below it),
prioritised choice will take care of it.  Now imagine there are two
ways to parse the following input:

   _B_ _C_ _D_-- first  possibility
  /   \   /   \   /   \
  a   b   c   d   e   f   -- input
  \___/   |   \___/
  B   C D -- second possibility

The left back pointers move back through the rule. So a pre-order
traversal will go like this (I'm using tildes because it was
annoying to copy/paste the bullet):

A - B C D ~  -- at this step you apply A = B C D.
A - B C ~ D
A - B ~ C D
A - ~ B C D
No more left pointers, so return back up to A - B ~ C D.
Follow its right pointer to B ... ambiguous! So choose between:

B - a b ~
B - a b c ~

Then return back up to A - B C ~ D and follow its right pointer
to deal with C, etc.

--Josh
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Earley Parsing Explained (incomplete first draft)

2014-09-20 Thread Josh Grams
On 2014-09-20 06:58AM, Josh Grams wrote:
Mainly the recognizer. Did you figure that out? I think the
important insight is that matches added by scanning and completion
operations represent a (partial) derivation step.

Gah. Not scanning. What am I saying? Only completion. The completion
operation represents a rule being applied all the places where it
presently fits. Yes? So if we save those pointers we can just hook them
up starting from the root(s) to get the parse forest?

Sorry, I've only spent three or four hours looking at Earley parsing:
I'm not all the way up to speed yet.

--Josh
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Earley Parsing Explained (incomplete first draft)

2014-09-20 Thread Loup Vaillant-David
On Sat, Sep 20, 2014 at 11:57:58AM -0400, Josh Grams wrote:
 On 2014-09-20 02:27PM, Loup Vaillant-David wrote:
 Actually, you don't need the back pointers.  Plain Earley items are
 enough.  Even better, you don't need all the items.  You only need the
 completed ones.
 
 Sure, it's just a classic space/time tradeoff.

Not exactly.  My current guess is that not using back pointer yields
simpler code.  Reconstruction is simpler than back pointers
management.  Performance (memory or CPU) is secondary.


 This business with back pointers is needlessly complicated, I found.
 
 Hmm...did you try drawing out where they point to?

I did.  It gave me much insight about the structure of ambiguity.  And
I eventually found the main problem with back pointers: they point
*back*.

By default, Earley recognisers store their Earley items in such a way
that reconstruction (or back pointer following) happens from right to
left.  Which means the first ambiguities will be detected at the *end*
of the parse.

In parsing expression grammars, it's very intuitive.  Imagine you
parse this rule:

  A - B C D

So you parse B, then C, then D (they are written in that order, after
all).  If you detect an ambiguity in B (or anything below it),
prioritised choice will take care of it.  Now imagine there are two
ways to parse the following input:

   _B_ _C_ _D_-- first  possibility
  /   \   /   \   /   \
  a   b   c   d   e   f   -- input
  \___/   |   \___/
  B   C D -- second possibility

The parsing expression grammar will first solve the ambiguity at the B
level.  Once B is parsed, there is only one possibility.  Now with
Earley items, since you start from the end, you will first deal with
D, then with C—

Aha! there's an ambiguity!  So you solve it at C's level, leaving only
one possibility for B.  So, instead of doing the prioritised choice at
B, where a user would expect it, you do it at C, quite possibly
yielding a different choice.

---

So I knew I had to do something to start from the beginning, instead of
from the end.

With the back pointers, it meant transforming the back pointers into
forward pointers.  Say you have Three items, A, B and C.  B has a back
pointer to A, and that back pointer has a down pointer to C.  You need
to add a forward pointer to A that point to B.  And that forward
pointer needs a down pointer that points to C.  So you go from:

  A --+--- B
   V
   C
To

  A ---+-- B
   V
   C

And of course, those pointers are not unique, so you need to maintain
a *list* of those…  So I just threw up my arms, and tried another
approach.  I thought for a moment that we *needed* those back
pointers.  But then I have read that paper about chart parsing, and
realise you could do everything with completed items alone.

This has 2 advantages:

  - Completed Earley items are simpler than the others, and very easy
to reverse.  (Unlike the back pointer stuff.)
  - No need to modify the recogniser to have your parser.  They are
completely independent, making them easier to learn.

---

TL;DR: ignore Elizabeth Scott's SPPF-style Parsing for Earley
Recognisers.  It's not useful when you only want to select one parse
tree.

Loup.
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


[fonc] Earley Parsing Explained (incomplete first draft)

2014-09-18 Thread Loup Vaillant-David
Hi,

After spending months banging my head over Earley Parsing, I have
decided to write a tutorial.  Ian once said Earley parsing is simple
and easy to implement.  I agree with simple, but not with easy.
The required background knowledge is not trivial.

This tutorial is an attempt to gather this knowledge in one place.
Nothing fancy, no deep math, no proof of correctness.  Just a
(hopefully) intuitive explanation of the concepts, needed to implement
Earley parsing.  The goal is to help competent programmers who know
little about parsing to write their own Earley parsing framework.

So far, I have done most of the recogniser.  The following pages are
done, and up for review:

http://loup-vaillant.fr/tutorials/earley-parsing/
http://loup-vaillant.fr/tutorials/earley-parsing/what-and-why
http://loup-vaillant.fr/tutorials/earley-parsing/chart-parsing
http://loup-vaillant.fr/tutorials/earley-parsing/recogniser

Questions and criticisms are most welcome.  I'd like to know about any
factual inaccuracy, poor wording, confusing explanation… please don't
hesitate to question anything, even the structure of this tutorial.

Enjoy, and thanks,
Loup.
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Earley Parsing Explained (incomplete first draft)

2014-09-18 Thread Julian Leviston
On 19 Sep 2014, at 7:11 am, Loup Vaillant-David l...@loup-vaillant.fr wrote:

 Hi,
 
 After spending months banging my head over Earley Parsing, I have
 decided to write a tutorial.  Ian once said Earley parsing is simple
 and easy to implement.  I agree with simple, but not with easy.
 The required background knowledge is not trivial.
 

Easy is pretty subjective! What's easy to my a 6 year old is not going to be 
easy for a 50 year old, and vice versa.
Perhaps you mean approachable to someone of average intellect and cultural 
heritage?

 This tutorial is an attempt to gather this knowledge in one place.
 Nothing fancy, no deep math, no proof of correctness.  Just a
 (hopefully) intuitive explanation of the concepts, needed to implement
 Earley parsing.  The goal is to help competent programmers who know
 little about parsing to write their own Earley parsing framework.
 
 So far, I have done most of the recogniser.  The following pages are
 done, and up for review:
 
 http://loup-vaillant.fr/tutorials/earley-parsing/
 http://loup-vaillant.fr/tutorials/earley-parsing/what-and-why
 http://loup-vaillant.fr/tutorials/earley-parsing/chart-parsing
 http://loup-vaillant.fr/tutorials/earley-parsing/recogniser
 
 Questions and criticisms are most welcome.  I'd like to know about any
 factual inaccuracy, poor wording, confusing explanation… please don't
 hesitate to question anything, even the structure of this tutorial.
 
 Enjoy, and thanks,
 Loup.


I really like your writing style!

One criticism I'd have is there aren't any pointers to people who don't have 
programming language construction backgrounds. So, perhaps adding a 
pre-requisite reading list at the beginning aimed at bringing someone who has 
finished high school and nothing else up to speed...

This is perhaps my general criticism of most technical work. The assumed 
knowledge is high, and it shouldn't be. As more and more people get involved in 
computing, this means there can be less and less assumed backgrounds. I wrote 
about good code and by extension good writing a while back here: 
http://www.genericoverlords.com/good_coding

Julian

http://www.getcontented.com.au/ - You Need GetContented - Get Your Website 
Happy. :)___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc