Yes, you're on the right track. There is really a vast potential for
undo's -- the immediate reason that I rewrote the parse engine was to
make them easier to implement, which I wouldn't have done for any
ordinary new feature.
Here's how they relate to "forgiving lexemes": Right now if a lexeme is
not acceptable to G1, but is "forgiven", the lexer backs up and looks
for an acceptable lexeme. With undos, I can start the lexer, but then
immediately "undo" the predictions of any lexemes which will not be
acceptable to G1. These lexemes won't even be looked for, which is more
efficient.
The ability for the parser to say, "you know, on second thought, never
mind" is useful in all sort of contexts, many of which I hope will occur
to others before they occur to me.
Incidentally, phase 3 will offer a limited ability to "undo", targeted
at the case I just mentioned -- forgiving predictions at location 0 in
order to implement "cheap forgiveness".
Undo's will also be available through the THIF, and those of you looking
at writing your own Marpa interfaces should be able to get a lot of
mileage out of them.
-- jeffrey
On 01/25/2014 05:22 AM, Ruslan Shvedov wrote:
I was thinking about use cases for Undo's and here is the braindump:
this use case for forgiving adverb
<https://gist.github.com/rns/8473261> to test for literals
depending on the symbol which follows them
MarpaX::Regex::Verbal I wrote about today in another thread on
this list
<https://groups.google.com/d/msg/marpa-parser/2TZn5nolyzk/CY597CfNT2IJ> to
implement look-ahead per Regex features directly rather than
invoke a regex for them.
47 /*_maybe/ rules in Jean-Damien's MarpaX::Languages::C::AST
<https://github.com/jddurand/MarpaX-Languages-C-AST/blob/master/lib/MarpaX/Languages/C/AST/Grammar/ISO_ANSI_C_2011.pm>
C grammar
Marpa-powered Lex: Lex regexes with Marpa-powered extensions
Am I on the right track?
On Sun, Jan 19, 2014 at 4:25 AM, Jeffrey Kegler
<[email protected]
<mailto:[email protected]>> wrote:
I am working at adding "undo's" to Libmarpa -- the ability, while
a parse
is in progress, to go back and erase or "undo" some of the rules it
has recognized.
Potential applications include zero-width assertions. Zero-width
assertions are things like word boundaries and line endings, etc.
Regexes think character by character and allow zero-width
assertions about
adjacent characters in an easy and general way. Regexes aren't really
parsers -- they're state machines that run deal in character
sequences.
But if the problem you're solving solves easily that way (and many
do),
then regexes are perfect.
Marpa is for languages that regexes won't handle well. And everything
regexes do, Marpa can do today, though it may require a different
approach. But there are cases where you want Marpa to, just for a
moment,
think like a regex engine. You want to be able to put zero-width
assertions into your BNF, just as if your BNF rule was a regex.
Undo's will make this possible.
A probable first application of undo's will be to the SLIF' s
"forgiving" lexemes. These were recently introduced to Marpa and
allow, in addition to the traditional longest-tokens-matching (LTM),
longest-acceptable-tokens-matching (LATM). That is, if you use
forgiving
tokens, Marpa can be smart about lexing, and use the context of
the parse
to determine which lexemes are or are not relevant. But in the current
implementation, it's safest to use LATM on a token by token basis
-- in
some cases forgiving tokens can cause an efficiency hit. With undo's,
Marpa will be able to do LATM efficiently in the general case.
Undo's will be a by-product of the Marpa parse engine rewrite
currently
underway. Undo's are, of course, on a rule by rule basis. But
Marpa's
parse engine had used a technique from Aycock & Horspool's 2002 paper,
and tracked, not rules and positions in them, but sets of
rule-positions
grouped in "LR(0) states".
At this point I have considerable experience with the use of LR(0)
states for Earley parsing and I believe that in practice and on
average,
when it comes to speed, their use is without benefit. In theoretical,
time complexity, terms, LR(0) states never offered any gains, so
that the potential gains were always limited. And implementing LR(0)
states within Earley's algorithm was suprisingly complex, and resulted
in a parser that was far harder to understand, maintain and extend.
So Marpa's parse engine is going back to the traditional Earley method
of tracking progress by rule and position.
As long as Marpa's parse engine used LR(0) states, implementing undo's
was possible, but daunting. Now that the parse engine works directly
using rules, the road to undo's becomes much easier.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:marpa-parser%[email protected]>.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "marpa
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.