After good night sleep I think longest expected token match may be not that bad.
On Tue, Jan 7, 2014 at 5:10 AM, Jeffrey Kegler < [email protected]> wrote: > Before getting into alternatives, I think it's useful to explain why the > SLIF currently lexes the way it does. > > First note that not only does Marpa parse any BNF, but usually it does it > in linear time. I suspect one reason for the resistance to Marpa is that > when people mention this to experts, the experts know this is impossible, > and they dismiss Marpa as one in a long series of parsing fads. > > In fact, Marpa can be as bad as O(n**3), but it's linear for every single > one of the subclasses of BNF listed in > Wikipedia<http://en.wikipedia.org/wiki/Context-free_grammar#Subclasses>: > LR(k), LL(k) for all k, etc., etc. You can make Marpa go cubic, but you > have to go out of your way to do it. If your grammar is unambiguous, the > worst you can make Marpa do is O(n**2) and for that you've really got to > try hard. (Hint: just doing left- and right-recursions, even lots of them > in complex patterns, will not be enough.) > > OK, so what's all this got to do with lexing? > > Well, suppose I do lexing, and I use Marpa's knowledge of what tokens to > expect. It is possible for a user to write a lexer which lexes all the way > to the end of the string and finds one or more lexemes not acceptable to > the grammar. The lexer would then backtrack until it found an acceptable > lexeme, perhaps just 3 characters long. The lexer might do this again and > again, and the result would be quadratic -- O(n**2). > > With forgiving longest-tokens-match, it's not only possible to make Marpa > go quadratic -- it is easy to do by accident. And it is not hard to do it > in a way that works fine in your test cases, but goes quadratic when a > real-world example hits it. > > Quadratic, when it comes to parsing, is very close in meaning to > "unuseable". > > So in this context, you can see unforgiving longest-tokens-matching as a > kind of "fast fail". The lexers which cause it a problem are potential > candidates for quadratic behavior. > > "Potential candidates?", you may ask. "This means unforgiving LTM is > preventing me from using some perfectly fine lexers, doesn't it?" Well, > ok, yes. > > One of the features, I've considered adding is to mark a token > "forgiving". That tells Marpa, when that token is rejected by the grammar, > not to fail, but to try shorter tokens. This would have the advantage that > it is selective, and it forces the user to mark the potential problems. > Any speed issues with the SLIF, and you go back and look at the tokens you > marked forgiving. > > Many years ago, I was the student of Ned Irons, who wrote the first paper > to describe a parser (1961). Ned suggested I look into lexing, which he > noted, was actually an interesting field. At the time, I could not imagine > what he was talking about. I didn't ask him any followup questions. That > was a mistake. Some of this work might have happened 3 decades ago. > > -- jeffrey > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- Best regards, Ruslan. -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
