[Bug 7022] normalize_charset

bugzilla-daemon Wed, 12 Mar 2014 17:37:27 -0700

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7022


--- Comment #17 from Ivo Truxa <[email protected]> ---
In fact, I think it can be done transparent for the user or the rule developer,
so that he does not need to bother. Just as it is now, I'd let the admin the
choice to disable the normalizing altogether, enable the Unicode normalizing,
or the ASCII normalizing.

Then, SA, when processing rules would look whether the rule contains non-ASCII
characters. If it does, it would let it match against the UTF8 or against the
non-normalized version (depending on normalize_charset), otherwise with the
ASCII normalized one.

This would cover the vast majority of cases. Only in rather rare cases someone
might like to run an ASCII regex on the non-ASCII version, and in such case a
special tflag could be used.

However, as I told already previously, I think the default setting should stay
as it is - no normalizing, but both the UTF8 and the ASCII normalizing should
be available to administrators who want to use them, regardless if there is any
tflag for normalized/non-normalized versions available or not.

Finally, if I am not mistaken, currently there is also no tflag for the Unicode
normalizing, so if there are any rules written for UTF8, or for some specific
code-pages, then they also do not always work correctly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7022] normalize_charset

Reply via email to