Unforgiving longest-tokens-match: first, the apologetic

Jeffrey Kegler Mon, 06 Jan 2014 17:10:59 -0800

Before getting into alternatives, I think it's useful to explain why theSLIF currently lexes the way it does.

First note that not only does Marpa parse any BNF, but usually it doesit in linear time. I suspect one reason for the resistance to Marpa isthat when people mention this to experts, the experts know this isimpossible, and they dismiss Marpa as one in a long series of parsing fads.

In fact, Marpa can be as bad as O(n**3), but it's linear for everysingle one of the subclasses of BNF listed in Wikipedia<http://en.wikipedia.org/wiki/Context-free_grammar#Subclasses>: LR(k),LL(k) for all k, etc., etc. You can make Marpa go cubic, but you haveto go out of your way to do it. If your grammar is unambiguous, theworst you can make Marpa do is O(n**2) and for that you've really got totry hard. (Hint: just doing left- and right-recursions, even lots ofthem in complex patterns, will not be enough.)


OK, so what's all this got to do with lexing?

Well, suppose I do lexing, and I use Marpa's knowledge of what tokens toexpect. It is possible for a user to write a lexer which lexes all theway to the end of the string and finds one or more lexemes notacceptable to the grammar. The lexer would then backtrack until itfound an acceptable lexeme, perhaps just 3 characters long. The lexermight do this again and again, and the result would be quadratic -- O(n**2).

With forgiving longest-tokens-match, it's not only possible to makeMarpa go quadratic -- it is easy to do by accident. And it is not hardto do it in a way that works fine in your test cases, but goes quadraticwhen a real-world example hits it.

Quadratic, when it comes to parsing, is very close in meaning to"unuseable".

So in this context, you can see unforgiving longest-tokens-matching as akind of "fast fail". The lexers which cause it a problem are potentialcandidates for quadratic behavior.

"Potential candidates?", you may ask. "This means unforgiving LTM ispreventing me from using some perfectly fine lexers, doesn't it?" Well,ok, yes.

One of the features, I've considered adding is to mark a token"forgiving". That tells Marpa, when that token is rejected by thegrammar, not to fail, but to try shorter tokens. This would have theadvantage that it is selective, and it forces the user to mark thepotential problems. Any speed issues with the SLIF, and you go back andlook at the tokens you marked forgiving.

Many years ago, I was the student of Ned Irons, who wrote the firstpaper to describe a parser (1961). Ned suggested I look into lexing,which he noted, was actually an interesting field. At the time, I couldnot imagine what he was talking about. I didn't ask him any followupquestions. That was a mistake. Some of this work might have happened 3decades ago.


-- jeffrey

--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Unforgiving longest-tokens-match: first, the apologetic

Reply via email to