Regular expression complexity - a challenge

Robin Houston Fri, 19 Oct 2001 09:06:20 -0700

We've had a thread here before called "Fun with complexity classes",
so I know that _some_ of you enjoy this sort of thing. Apologies to
those who don't.


Recently I've been thinking about the complexity of regular expression
matching. (To be precise, I've been thinking about the membership
rather than the matching problem -- in other words, I've been assuming
that the regular expressions are always anchored at both ends, like
/^...$/)

Let us define the _uniform membership problem_ to be:
  Given a regex and a string, does the regex match the string?

If your regexes allow back referencing, then this problem is
NP-complete. Most of us know that from MJD's web page[1], I imagine,
though it was actually first published in 1990 by Aho[2].

MJD draws the conclusion that "the exponential running time of the
matching algorithm is probably unavoidable". I've come to think that
conclusion is too strong. Let me explain:

Given a regex $r, the _fixed expression membership problem_ for $r is:
  Given a string, does $r match the string?


It seems entirely possible that, for every single regex, its fixed
expression membership problem can be solved in polynomial time. That's
quite consistent with the NP-completeness of the universal problem:
For example, imagine an algorithm which takes a regex (with back
referencing and all), and produces a matching machine for that regex.
The matching machine can then be used to determine whether a string
matches, and the matching machine runs in polynomial time.

The reason there is no conflict is that producing the matching machine
(i.e. compiling the regex) might take (at worst) time exponential in
the length of the regex. That's like what can happen if you go
regex->NFA->DFA: the subset construction for going from NFA to DFA can
take exponential time and space.


So... (is anyone still with me?) I contend that if you give me a
regex (any regex), I can write a sub which takes a string and returns
whether or not the string matches the regex that you gave me, and
which runs in polynomial time.

*Your* challenge is to prove me wrong. Can you give me a regex that I
can't do that for? (No, you're not allowed to use interpolated code
constructs in the regex. That *is* cheating). It'll have to be one
which takes exponential time usually (otherwise I'll just write C<sub
matcher { shift() =~ $regex }>), and it'll have to use back
referencing (otherwise I could just use the ordinary finite automaton
method).

Any takers?

 .robin.

[1] http://perl.plover.com/NPC/
[2] Handbook of Theoretical Computer Science, volume A.

Regular expression complexity - a challenge

Reply via email to