[Bug 7022] normalize_charset

bugzilla-daemon Wed, 12 Mar 2014 15:53:28 -0700

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7022


--- Comment #8 from Ivo Truxa <[email protected]> ---
Created attachment 5192
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5192&action=edit
diffs for all three modules

OK, I am attaching the diffs. Hope I did it correctly.

BTW, the possibilities of the obfuscation by Unicode are practically endless -
you will find easily 20 or often even more accented or visually similar
variants for practically every letter. It means that already at 5-letter words,
the number of available permutations can easily go into millions. Although each
of them may be a strong spam marker, you need to learn them all first, and need
a sufficiently big Bayes database to keep them all. In contrary, if you
de-obfuscate them, the original word may help you to catch the spam better than
each of the rarely used variants.

However, all these are just speculations. We need to perform some comparative
tests to see what is better.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7022] normalize_charset

Reply via email to