On 04/19/2010 10:07 AM, Jim Meyering wrote:
Paolo Bonzini wrote:
On 04/17/2010 09:27 AM, Jim Meyering wrote:
Paolo Bonzini wrote:
* NEWS: Document improvement.
* src/dfa.c (struct dfa): Add utf8_anychar_classes.
(add_utf8_anychar): New.
(atom): Simplify if/else nesting. Call add_utf8_anychar for ANYCHAR
in UTF-8 locales.
(dfaoptimize): Abort on ANYCHAR.
---
NEWS | 6 ++++++
src/dfa.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 49 insertions(+), 3 deletions(-)
Only quick superficial feedback for now:
I pushed all patches but this.
Thanks!
Would you please add comments describing this one in more detail?
I ran out of time trying to understand how it works.
Something like this?
/* For UTF-8 expand the period to a series of CSETs that define a valid
UTF-8 character. This avoids using the slow multibyte path. I'm
pretty sure it would be both profitable and correct to do it for
any encoding; however, the optimization must be done manually as
it is done above in add_utf8_anychar. So, let's start with
UTF-8: it is the most used, and the structure of the encoding
makes the correctness more obvious. */