OK. One last bit on regexps, and then I'll go crawl under a rock for a
while more. This one has to do with prioritized choice.

Classic PCRE actually doesn't match regular expressions at all.
Backreferences require a pushdown automata. Russ took most of those
features out of his input specification, so I'm going to ignore them here.

PCRE implements (|) as "prioritized choice" rather than adopting the
traditional RE rule of "longest match". Ironically, adopting the
prioritized choice model simplifies the token matching problem, because it
resolves the problem of multiple matches, and thereby induces a clear rule
about which token ID should prevail on match. We can obviously add a
priority to the thread context, but that eliminates our ability to avoid
redundant additions of matching threads. The number of total possible
priorities is bounded by the length of the input RE, but still, it's messy.

For a while, I thought we might get away with it on the grounds that
priority seemed to be a strict function of the PC value. Unfortunately, the
fine print of the subpattern rule says that the the higher priority
subpattern only wins if the remainder of the regular expression matches.
It's pretty fundamentally a recursive, backtracking implementation.
Offhand, I see no way (in general) to build a deterministic automata when
this rule is in effect. I can see ways to optimize/simplify a PCRE in such
a way that we can recognize when prioritized choice isn't necessary, but I
don't see a general way to turn an NFA with prioritized matching into a DFA.

This may be a good thing for now, because it means that I can stop screwing
around with re2/PCRE sooner, but given my head is in it at the moment, it
would sure be nice not to do that.

Looking at the pcrematch documentation, it appears to me that
pcre_dfa_exec() doesn't actually implement a DFA at all.

Anybody see a way around this? I confess that I do not, and it seems that
pcre "dfa" does not either.


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to