https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7022

--- Comment #10 from AXB <[email protected]> ---
(In reply to Kevin A. McGrail from comment #9)
> (In reply to Ivo Truxa from comment #8)
> > Created attachment 5192 [details]
> > diffs for all three modules
> > 
> > OK, I am attaching the diffs. Hope I did it correctly.
> > 
> > BTW, the possibilities of the obfuscation by Unicode are practically endless
> > - you will find easily 20 or often even more accented or visually similar
> > variants for practically every letter. It means that already at 5-letter
> > words, the number of available permutations can easily go into millions.
> > Although each of them may be a strong spam marker, you need to learn them
> > all first, and need a sufficiently big Bayes database to keep them all. In
> > contrary, if you de-obfuscate them, the original word may help you to catch
> > the spam better than each of the rarely used variants.
> > 
> > However, all these are just speculations. We need to perform some
> > comparative tests to see what is better.
> 
> The idea John had ties in very neatly to the idea I had of needing a
> separate body message without the subject via a tflag.  A tflag for a
> non-obfuscated version for specific rules might help a lot.

Am I getting this right?
1.- normalization would have to be switched on ?
2.- normalized rules would require a tflag ?

If yes, sounds good tho I wonder how this could affect the ok_locale stuff,
etc, etc (fearing a can of worms)

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to