Before getting into alternatives, I think it's useful to explain why the
SLIF currently lexes the way it does.
First note that not only does Marpa parse any BNF, but usually it does
it in linear time. I suspect one reason for the resistance to Marpa is
that when people mention this to experts, the experts know this is
impossible, and they dismiss Marpa as one in a long series of parsing fads.
In fact, Marpa can be as bad as O(n**3), but it's linear for every
single one of the subclasses of BNF listed in Wikipedia
<http://en.wikipedia.org/wiki/Context-free_grammar#Subclasses>: LR(k),
LL(k) for all k, etc., etc. You can make Marpa go cubic, but you have
to go out of your way to do it. If your grammar is unambiguous, the
worst you can make Marpa do is O(n**2) and for that you've really got to
try hard. (Hint: just doing left- and right-recursions, even lots of
them in complex patterns, will not be enough.)
OK, so what's all this got to do with lexing?
Well, suppose I do lexing, and I use Marpa's knowledge of what tokens to
expect. It is possible for a user to write a lexer which lexes all the
way to the end of the string and finds one or more lexemes not
acceptable to the grammar. The lexer would then backtrack until it
found an acceptable lexeme, perhaps just 3 characters long. The lexer
might do this again and again, and the result would be quadratic -- O(n**2).
With forgiving longest-tokens-match, it's not only possible to make
Marpa go quadratic -- it is easy to do by accident. And it is not hard
to do it in a way that works fine in your test cases, but goes quadratic
when a real-world example hits it.
Quadratic, when it comes to parsing, is very close in meaning to
"unuseable".
So in this context, you can see unforgiving longest-tokens-matching as a
kind of "fast fail". The lexers which cause it a problem are potential
candidates for quadratic behavior.
"Potential candidates?", you may ask. "This means unforgiving LTM is
preventing me from using some perfectly fine lexers, doesn't it?" Well,
ok, yes.
One of the features, I've considered adding is to mark a token
"forgiving". That tells Marpa, when that token is rejected by the
grammar, not to fail, but to try shorter tokens. This would have the
advantage that it is selective, and it forces the user to mark the
potential problems. Any speed issues with the SLIF, and you go back and
look at the tokens you marked forgiving.
Many years ago, I was the student of Ned Irons, who wrote the first
paper to describe a parser (1961). Ned suggested I look into lexing,
which he noted, was actually an interesting field. At the time, I could
not imagine what he was talking about. I didn't ask him any followup
questions. That was a mistake. Some of this work might have happened 3
decades ago.
-- jeffrey
--
You received this message because you are subscribed to the Google Groups "marpa
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.