Re: [PATCH 3/3] dfa: optimize UTF-8 period

Paolo Bonzini Mon, 19 Apr 2010 05:06:34 -0700

On 04/19/2010 10:07 AM, Jim Meyering wrote:

Paolo Bonzini wrote:

On 04/17/2010 09:27 AM, Jim Meyering wrote:

Paolo Bonzini wrote:

* NEWS: Document improvement.
* src/dfa.c (struct dfa): Add utf8_anychar_classes.
(add_utf8_anychar): New.
(atom): Simplify if/else nesting.  Call add_utf8_anychar for ANYCHAR
in UTF-8 locales.
(dfaoptimize): Abort on ANYCHAR.
---
   NEWS      |    6 ++++++
   src/dfa.c |   46 +++++++++++++++++++++++++++++++++++++++++++---
   2 files changed, 49 insertions(+), 3 deletions(-)


Only quick superficial feedback for now:


I pushed all patches but this.


Thanks!

Would you please add comments describing this one in more detail?
I ran out of time trying to understand how it works.


Something like this?

 /* For UTF-8 expand the period to a series of CSETs that define a valid
    UTF-8 character.  This avoids using the slow multibyte path.  I'm
    pretty sure it would be both profitable and correct to do it for
    any encoding; however, the optimization must be done manually as
    it is done above in add_utf8_anychar.  So, let's start with
    UTF-8: it is the most used, and the structure of the encoding
    makes the correctness more obvious.  */

Re: [PATCH 3/3] dfa: optimize UTF-8 period

Reply via email to