https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7022
--- Comment #10 from AXB <[email protected]> --- (In reply to Kevin A. McGrail from comment #9) > (In reply to Ivo Truxa from comment #8) > > Created attachment 5192 [details] > > diffs for all three modules > > > > OK, I am attaching the diffs. Hope I did it correctly. > > > > BTW, the possibilities of the obfuscation by Unicode are practically endless > > - you will find easily 20 or often even more accented or visually similar > > variants for practically every letter. It means that already at 5-letter > > words, the number of available permutations can easily go into millions. > > Although each of them may be a strong spam marker, you need to learn them > > all first, and need a sufficiently big Bayes database to keep them all. In > > contrary, if you de-obfuscate them, the original word may help you to catch > > the spam better than each of the rarely used variants. > > > > However, all these are just speculations. We need to perform some > > comparative tests to see what is better. > > The idea John had ties in very neatly to the idea I had of needing a > separate body message without the subject via a tflag. A tflag for a > non-obfuscated version for specific rules might help a lot. Am I getting this right? 1.- normalization would have to be switched on ? 2.- normalized rules would require a tflag ? If yes, sounds good tho I wonder how this could affect the ok_locale stuff, etc, etc (fearing a can of worms) -- You are receiving this mail because: You are the assignee for the bug.
