https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7022
--- Comment #9 from Kevin A. McGrail <[email protected]> --- (In reply to Ivo Truxa from comment #8) > Created attachment 5192 [details] > diffs for all three modules > > OK, I am attaching the diffs. Hope I did it correctly. > > BTW, the possibilities of the obfuscation by Unicode are practically endless > - you will find easily 20 or often even more accented or visually similar > variants for practically every letter. It means that already at 5-letter > words, the number of available permutations can easily go into millions. > Although each of them may be a strong spam marker, you need to learn them > all first, and need a sufficiently big Bayes database to keep them all. In > contrary, if you de-obfuscate them, the original word may help you to catch > the spam better than each of the rarely used variants. > > However, all these are just speculations. We need to perform some > comparative tests to see what is better. The idea John had ties in very neatly to the idea I had of needing a separate body message without the subject via a tflag. A tflag for a non-obfuscated version for specific rules might help a lot. -- You are receiving this mail because: You are the assignee for the bug.
