Re: Unforgiving longest-tokens-match: first, the apologetic

Ruslan Zakirov Tue, 07 Jan 2014 01:39:22 -0800

After good night sleep I think longest expected token match may be not that
bad.



On Tue, Jan 7, 2014 at 5:10 AM, Jeffrey Kegler <
[email protected]> wrote:

>  Before getting into alternatives, I think it's useful to explain why the
> SLIF currently lexes the way it does.
>
> First note that not only does Marpa parse any BNF, but usually it does it
> in linear time.  I suspect one reason for the resistance to Marpa is that
> when people mention this to experts, the experts know this is impossible,
> and they dismiss Marpa as one in a long series of parsing fads.
>
> In fact, Marpa can be as bad as O(n**3), but it's linear for every single
> one of the subclasses of BNF listed in 
> Wikipedia<http://en.wikipedia.org/wiki/Context-free_grammar#Subclasses>:
> LR(k), LL(k) for all k, etc., etc.  You can make Marpa go cubic, but you
> have to go out of your way to do it.  If your grammar is unambiguous, the
> worst you can make Marpa do is O(n**2) and for that you've really got to
> try hard.  (Hint: just doing left- and right-recursions, even lots of them
> in complex patterns, will not be enough.)
>
> OK, so what's all this got to do with lexing?
>
> Well, suppose I do lexing, and I use Marpa's knowledge of what tokens to
> expect.  It is possible for a user to write a lexer which lexes all the way
> to the end of the string and finds one or more lexemes not acceptable to
> the grammar.  The lexer would then backtrack until it found an acceptable
> lexeme, perhaps just 3 characters long.  The lexer might do this again and
> again, and the result would be quadratic -- O(n**2).
>
> With forgiving longest-tokens-match, it's not only possible to make Marpa
> go quadratic -- it is easy to do by accident.  And it is not hard to do it
> in a way that works fine in your test cases, but goes quadratic when a
> real-world example hits it.
>
> Quadratic, when it comes to parsing, is very close in meaning to
> "unuseable".
>
> So in this context, you can see unforgiving longest-tokens-matching as a
> kind of "fast fail".  The lexers which cause it a problem are potential
> candidates for quadratic behavior.
>
> "Potential candidates?", you may ask.  "This means unforgiving LTM is
> preventing me from using some perfectly fine lexers, doesn't it?"  Well,
> ok, yes.
>
> One of the features, I've considered adding is to mark a token
> "forgiving".  That tells Marpa, when that token is rejected by the grammar,
> not to fail, but to try shorter tokens.  This would have the advantage that
> it is selective, and it forces the user to mark the potential problems.
> Any speed issues with the SLIF, and you go back and look at the tokens you
> marked forgiving.
>
> Many years ago, I was the student of Ned Irons, who wrote the first paper
> to describe a parser (1961).  Ned suggested I look into lexing, which he
> noted, was actually an interesting field.  At the time, I could not imagine
> what he was talking about.  I didn't ask him any followup questions.  That
> was a mistake.  Some of this work might have happened 3 decades ago.
>
> -- jeffrey
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Best regards, Ruslan.

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Unforgiving longest-tokens-match: first, the apologetic

Reply via email to