Undo's

Jeffrey Kegler Sat, 18 Jan 2014 18:26:21 -0800

I am working at adding "undo's" to Libmarpa -- the ability, while a parse
is in progress, to go back and erase or "undo" some of the rules it
has recognized.


Potential applications include zero-width assertions. Zero-width
assertions are things like word boundaries and line endings, etc.
Regexes think character by character and allow zero-width assertions about
adjacent characters in an easy and general way.  Regexes aren't really
parsers -- they're state machines that run deal in character sequences.
But if the problem you're solving solves easily that way (and many do),
then regexes are perfect.

Marpa is for languages that regexes won't handle well.  And everything
regexes do, Marpa can do today, though it may require a different
approach.  But there are cases where you want Marpa to, just for a moment,
think like a regex engine.  You want to be able to put zero-width
assertions into your BNF, just as if your BNF rule was a regex.
Undo's will make this possible.

A probable first application of undo's will be to the SLIF' s
"forgiving" lexemes.  These were recently introduced to Marpa and
allow, in addition to the traditional longest-tokens-matching (LTM),
longest-acceptable-tokens-matching (LATM).  That is, if you use forgiving
tokens, Marpa can be smart about lexing, and use the context of the parse
to determine which lexemes are or are not relevant.  But in the current
implementation, it's safest to use LATM on a token by token basis -- in
some cases forgiving tokens can cause an efficiency hit.  With undo's,
Marpa will be able to do LATM efficiently in the general case.

Undo's will be a by-product of the Marpa parse engine rewrite currently
underway.  Undo's are, of course, on a rule by rule basis.  But Marpa's
parse engine had used a technique from Aycock & Horspool's 2002 paper,
and tracked, not rules and positions in them, but sets of rule-positions
grouped in "LR(0) states".

At this point I have considerable experience with the use of LR(0)
states for Earley parsing and I believe that in practice and on average,
when it comes to speed, their use is without benefit.  In theoretical,
time complexity, terms, LR(0) states never offered any gains, so
that the potential gains were always limited.  And implementing LR(0)
states within Earley's algorithm was suprisingly complex, and resulted
in a parser that was far harder to understand, maintain and extend.
So Marpa's parse engine is going back to the traditional Earley method
of tracking progress by rule and position.

As long as Marpa's parse engine used LR(0) states, implementing undo's
was possible, but daunting.  Now that the parse engine works directly
using rules, the road to undo's becomes much easier.

--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Undo's

Reply via email to