On Tue, 2006-03-07 at 15:19 +0100, Nicolas Cannasse wrote: > > However this form of stream parsing is quite weak. > > [It can't even parse regular expressions] > > I think it can.
AFAIK no LL parser can do it, not even with infinite backtracking. The problem is that recursive parsers can only backtrack to the last failure point and try the next alternative. Proper backtracking requires restarting at the previous choice point INCLUDING success points. That can handle arbitrary non-left recursive context free grammars, and it called an Earley Parser AFAIK. GLR and GLL are the modern equivalents, only they don't backtrack, they persue alternatives simultaneously, and they can handle ANY grammar. > Can you give a more precise sample of the problem ? The classic from the Dragon book: (a|b)*abb$ is indicative of the problem. It isn't good enough to backtrack to the last failure point and try the next alternative .. because (a|b)* never fails. For your stream parser, this is evident from the rule <loop> ::= (a|b) <loop> <re> ::= <loop> abb$ The problem is <loop> goobles up the whole stream, and junks every single token: there are no earlier choice points where you can backtrack after failure an try again. This is because your parser makes every subrule descent a successful cut point which you can't backtrack past. In this case you can fix it by unrolling the loop twice (or three times .. not sure). However if you nest it in one more loop, you're stuffed. Of course if you ANALYSE the regexp, you will get a finite state machine and you can always write an LL1 grammar for that. -- John Skaller <skaller at users dot sourceforge dot net> Async PL, Realtime software consultants Checkout Felix: http://felix.sourceforge.net -- Neko : One VM to run them all (http://nekovm.org)
