Before getting into alternatives, I think it's useful to explain why the SLIF currently lexes the way it does.

First note that not only does Marpa parse any BNF, but usually it does it in linear time. I suspect one reason for the resistance to Marpa is that when people mention this to experts, the experts know this is impossible, and they dismiss Marpa as one in a long series of parsing fads.

In fact, Marpa can be as bad as O(n**3), but it's linear for every single one of the subclasses of BNF listed in Wikipedia <http://en.wikipedia.org/wiki/Context-free_grammar#Subclasses>: LR(k), LL(k) for all k, etc., etc. You can make Marpa go cubic, but you have to go out of your way to do it. If your grammar is unambiguous, the worst you can make Marpa do is O(n**2) and for that you've really got to try hard. (Hint: just doing left- and right-recursions, even lots of them in complex patterns, will not be enough.)

OK, so what's all this got to do with lexing?

Well, suppose I do lexing, and I use Marpa's knowledge of what tokens to expect. It is possible for a user to write a lexer which lexes all the way to the end of the string and finds one or more lexemes not acceptable to the grammar. The lexer would then backtrack until it found an acceptable lexeme, perhaps just 3 characters long. The lexer might do this again and again, and the result would be quadratic -- O(n**2).

With forgiving longest-tokens-match, it's not only possible to make Marpa go quadratic -- it is easy to do by accident. And it is not hard to do it in a way that works fine in your test cases, but goes quadratic when a real-world example hits it.

Quadratic, when it comes to parsing, is very close in meaning to "unuseable".

So in this context, you can see unforgiving longest-tokens-matching as a kind of "fast fail". The lexers which cause it a problem are potential candidates for quadratic behavior.

"Potential candidates?", you may ask. "This means unforgiving LTM is preventing me from using some perfectly fine lexers, doesn't it?" Well, ok, yes.

One of the features, I've considered adding is to mark a token "forgiving". That tells Marpa, when that token is rejected by the grammar, not to fail, but to try shorter tokens. This would have the advantage that it is selective, and it forces the user to mark the potential problems. Any speed issues with the SLIF, and you go back and look at the tokens you marked forgiving.

Many years ago, I was the student of Ned Irons, who wrote the first paper to describe a parser (1961). Ned suggested I look into lexing, which he noted, was actually an interesting field. At the time, I could not imagine what he was talking about. I didn't ask him any followup questions. That was a mistake. Some of this work might have happened 3 decades ago.

-- jeffrey

--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to