In message <[EMAIL PROTECTED]>,
Robin Houston writes:
: When back referencing is allowed, you *can't* in general construct a
: DFA. Indeed, all DFA-recognisable languages are regular, and the
: language defined by /^([ab]*)\1$/ is not.
Right. Specifically, that language is context-sensitive. I don't have
my copy of the Linz book, but my recollection is that determining
whether a string is in a CSL is a problem in PSPACE (and maybe
PSPACE-complete). Can Perl regular expressions recognize all CSLs,
though? How would you match (a^n)(b^n)(c^n), i.e.,
('a' x $n) . ('b' x $n) . ('c' x $n)
for all $n > 0?
You're good with Context-Free Languages because the CYK algorithm (for
parsing CFLs) runs in O(n**3) time, but you were smart to rule out
(?{}) because that would put you in recursively enumerable languages
where membership is undecidable.
: However, for all the backreffing regexen that I can think of,
: there seems to be some way to check membership in polynomial time.
: Hence I'm wondering whether there's a membership algorithm (for
: regexes with back referencing) whose time complexity is polynomial
: in the length of the string -- though potentially superpolynomial in
: the length of the regex.
I'm not so sure. In my reduction from CLIQUE to Perl regular expression
matching[*], I provided a way to generate a regular expression for
finding $k-cliques in a graph:
$regex = '^ .*\b '
. join(' , .*\b ' => ('(\d+)') x $k)
. '\b .* ;'
. "\n";
for ($i = 1; $i < $k; $i++) {
for ($j = $i+1; $j <= $k; $j++) {
$regex .= '(?= .* \b ' . "\\$i-\\$j" . ' \b)' . "\n";
}
}
The regular expression for finding 3-cliques is
^ .*\b (\d+) , .*\b (\d+) , .*\b (\d+)\b .* ;
(?= .* \b \1-\2 \b)
(?= .* \b \1-\3 \b)
(?= .* \b \2-\3 \b)
[*] <URL:http://home.hiwaay.net/~gbacon/perl/clique.html>
If for each k we had a specialized k-clique finder that runs in time
polynomial in the size of the graph to be checked (i.e., if your
suspicion were true), then we'd have CLIQUE licked (and every other
problem in NP-complete along with it).
Regular expressions with backreferences can hide all the backtracking
that will be necessary to find that a string doesn't match.
Greg