[Bug 5691] Slow rules due to charset normalization not always clearing utf8 flag

bugzilla-daemon Wed, 24 Oct 2007 15:02:22 -0700

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5691



[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID




------- Additional Comments From [EMAIL PROTECTED]  2007-10-24 15:01 -------
The proposed patch is incorrect and this bug is INVALID.

The purpose of the utf8::downgrade call is to get the speed benefits of having
the utf8 flag cleared when it is possible to represent the characters without
using the utf8 flag.  In this particular case, it is not possible to represent
the characters without having the utf8 flag set, so the call leaves the utf8
flag set as it is intended to do.

Many rules are not charset-normalization-aware and thus may perform poorly or
incorrectly with charset normalization enabled.  For example, I have seen rules
test for non-ASCII by using [\x80-\xff].  With charset normalization, they need
to instead use [^\x00-\x7f].  Similarly, rules might need to use [0-9] instead
of \d.  Similarly \s and \w might catch more characters than intended.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5691] Slow rules due to charset normalization not always clearing utf8 flag

Reply via email to