On Tue, 2006-03-07 at 15:19 +0100, Nicolas Cannasse wrote:
> > However this form of stream parsing is quite weak.
> > [It can't even parse regular expressions]
> 
> I think it can.

AFAIK no LL parser can do it, not even with
infinite backtracking.

The problem is that recursive parsers can
only backtrack to the last failure point
and try the next alternative.

Proper backtracking requires restarting at the
previous choice point INCLUDING success points.

That can handle arbitrary non-left recursive
context free grammars, and it called an Earley Parser AFAIK.

GLR and GLL are the modern equivalents, only
they don't backtrack, they persue alternatives
simultaneously, and they can handle ANY grammar.

> Can you give a more precise sample of the problem ?

The classic from the Dragon book:

(a|b)*abb$

is indicative of the problem. It isn't good enough
to backtrack to the last failure point and try
the next alternative .. because (a|b)* never fails.

For your stream parser, this is evident from the rule

<loop> ::= (a|b) <loop> 
<re> ::= <loop> abb$

The problem is <loop> goobles up the whole stream,
and junks every single token: there are no earlier
choice points where you can backtrack after failure
an try again. This is because your parser makes every
subrule descent a successful cut point which you can't
backtrack past.

In this case you can fix it by unrolling the loop
twice (or three times .. not sure).

However if you nest it in one more loop, you're stuffed.

Of course if you ANALYSE the regexp, you will get a
finite state machine and you can always write an
LL1 grammar for that.

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


-- 
Neko : One VM to run them all
(http://nekovm.org)

Reply via email to