() Mark H Weaver <m...@netris.org> () Thu, 17 Mar 2011 21:38:28 -0400
If we may assume that the searched string is valid UTF-8, and when only ASCII characters are excluded (e.g. "."), then three additional states are required in the generated DFA. Let us call them S1, S2, and S3. [handling these states] When non-ASCII characters are excluded, additional states must be added: one for each unique prefix of the excluded multibyte characters. It's quite straightforward. I don't understand what "excluded" means here. Is this a property of the string, the regexp, the (dynamic, environmental) operation, or ...?