() Mark H Weaver <m...@netris.org>
() Thu, 17 Mar 2011 13:58:42 -0400

   * regexp search: The search itself can be implemented bytewise, exactly
     as if it was a fixed-width encoding.  Compiling the regexp can
     _almost_ be implemented as if the UTF-8-encoded regexp was in a
     fixed-width encoding, with just one added complication: a multibyte
     character followed by `*', `?' etc, must be compiled in such a way
     that the suffix operator applies to the whole character, and not just
     its final byte.  (In practice, it's probably more straightforward to
     handling compiling somewhat differently than outlined here, but you
     get the idea).

In unibyte land, "." matches a byte.  OK.

In multibyte land done "bytewise", "." matches ____________.
(What goes in the blank?)

Reply via email to