On 04/19/2010 10:07 AM, Jim Meyering wrote:
Paolo Bonzini wrote:
On 04/17/2010 09:27 AM, Jim Meyering wrote:
Paolo Bonzini wrote:
* NEWS: Document improvement.
* src/dfa.c (struct dfa): Add utf8_anychar_classes.
(add_utf8_anychar): New.
(atom): Simplify if/else nesting.  Call add_utf8_anychar for ANYCHAR
in UTF-8 locales.
(dfaoptimize): Abort on ANYCHAR.
---
   NEWS      |    6 ++++++
   src/dfa.c |   46 +++++++++++++++++++++++++++++++++++++++++++---
   2 files changed, 49 insertions(+), 3 deletions(-)

Only quick superficial feedback for now:

I pushed all patches but this.

Thanks!

Would you please add comments describing this one in more detail?
I ran out of time trying to understand how it works.

Something like this?

 /* For UTF-8 expand the period to a series of CSETs that define a valid
    UTF-8 character.  This avoids using the slow multibyte path.  I'm
    pretty sure it would be both profitable and correct to do it for
    any encoding; however, the optimization must be done manually as
    it is done above in add_utf8_anychar.  So, let's start with
    UTF-8: it is the most used, and the structure of the encoding
    makes the correctness more obvious.  */


Reply via email to