[Bug 7022] normalize_charset

bugzilla-daemon Wed, 12 Mar 2014 15:57:37 -0700

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7022


--- Comment #9 from Kevin A. McGrail <[email protected]> ---
(In reply to Ivo Truxa from comment #8)
> Created attachment 5192 [details]
> diffs for all three modules
> 
> OK, I am attaching the diffs. Hope I did it correctly.
> 
> BTW, the possibilities of the obfuscation by Unicode are practically endless
> - you will find easily 20 or often even more accented or visually similar
> variants for practically every letter. It means that already at 5-letter
> words, the number of available permutations can easily go into millions.
> Although each of them may be a strong spam marker, you need to learn them
> all first, and need a sufficiently big Bayes database to keep them all. In
> contrary, if you de-obfuscate them, the original word may help you to catch
> the spam better than each of the rarely used variants.
> 
> However, all these are just speculations. We need to perform some
> comparative tests to see what is better.

The idea John had ties in very neatly to the idea I had of needing a separate
body message without the subject via a tflag.  A tflag for a non-obfuscated
version for specific rules might help a lot.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7022] normalize_charset

Reply via email to