https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7022
--- Comment #6 from Ivo Truxa <[email protected]> --- (In reply to John Hardin from comment #5) > If this is done globally we'll lose the ability to detect some forms of > obfuscation. On the flip side, discarding the accents may have the effect of > making that obfuscation pointless. > > How does that balance out? Do we gain more from discarding all accents than > we lose from being able to tell whether or not accents are being used to > obfuscate a common word, which is a fairly strong spam sign? Yes, I think it will in fact unmask some of the obfuscation automatically. On the other hand, you are right that some obfuscated words would have higher spam scores than when not obfuscated, so you would miss that. There is also the possibility to append the ASCII normalization after the Unicode version (or the original). That would satisfy both needs, but would increase the memory needs and the database growth. However, the normalizing is optional, and the administrator can choose what is better for his case. In my case (the vast majority of email on the server is Czech, German or French with a big multitude of diverse charsets), I know I want the plain ASCII normalizing, already because writing the rules is a nightmare otherwise. But I am sure that many other administrators will opt for Unicode, or no normalizing at all. -- You are receiving this mail because: You are the assignee for the bug.
