Re: Regular expressions and closures (was: Perl 6 Summary for 2004-10-01 through 2004-10-17)

Luke Palmer Thu, 28 Oct 2004 01:33:01 -0700

Aaron Sherman writes:
> >   The current syntax for what you're
> > trying to write is:
> > 
> >     /ab(c|b) <($1 eq 'c')>/
> > 
> > which is equivalent to
> > 
> >     /ab(c|b) {fail unless $1 eq 'c'}/
> 
> Now, what does "fail" mean? I can think of two definitions:
> 
>       1. proceed to trap state (backtracking then happens)
>       2. exit (probably using an exception) the rule?


The first one.

> > :  State 3 (start subexp mark, end subexp mark):
> > :           match(token => 'c', target => 4),
> > :           match(token => 'b', target => 4),
> > :           match(token => !['c','b'], target => trap)
> > 
> > I find this notation confusing.  In Perl 5 regex, the start and end are
> > considered separate states, and the end state would follow the matches,
> > or you wouldn't capture the "motion", and $1 would end up null.  But
> > then, I think in NFA, not DFA...
> 
> NDFAs (or NFAs as you say) 

Last time I checked, "nondeterministic" was a word all by itself. :-p

> should be the same as DFAs for this purpose.  I think it's just a
> difference in the way we look at the token-matching process. I see it
> as part of the state, so to say "we both begin and end subexpression
> storage here" is to say, "only the token matched here will be stored
> as the subexpression." This restricts you, though, and I may find that
> in order to say:
> 
>       (a)((b)c)
> 
> I need too much hair and want to put the markers elsewhere.... I'll know
> that by next week.

I've given this a lot of thought in the past.  Way too much, actually.
I think you'll find your DFA approach to fail when you deal with the
generic case of perl 6 rules.

DFA's are better because they're faster, right?  O(n) matching time.
But we also know that neither DFAs nor NFAs can model a context-free
language without some state that's not in the automaton.  It turns out
that it's really easy to tack that state on the side of an NFA.
However, if you want to add that state to a DFA, your DFA will grow
exponentially in number of states with the number of rules in your
grammar.  It's also not possible to derive from a grammar and still keep
that algorithm, unless you do lazy computation of your DFA.  But
computation of a DFA in the general case takes time proportional to the
square of the number of states, which is growing exponentially.  So
you're doing worse than recursive descent in that case.

DFAs can't adequately model context free grammars.  I think it's best
for your brain if you leave it at that.

Perl 6 also allows conditionals within rules.  So if you have an
alternation:

    / [ <( cond_1() )> | <( cond_2() )> | <( cond_3() )> ] /

You're aware that you need to make those conditions orthogonal, right?
You can only proceed to one of those states if you're sure that it's the
_only_ one that matches.  It turns out that this is going to take as
much time _all the time_ as the NFA takes in its worst case.

DFAs are a bad idea for Perl 6. 

On the other hand, optimizing sufficiently simple parts of regular
expressions to DFAs and embedding them in a more general system is a
pretty good idea, IMO.

Luke

Re: Regular expressions and closures (was: Perl 6 Summary for 2004-10-01 through 2004-10-17)

Reply via email to