http://bugzilla.spamassassin.org/show_bug.cgi?id=4052
------- Additional Comments From [EMAIL PROTECTED] 2005-01-05 02:59 ------- > 1. In our experience, patterns which span 4 or more words, are often more > effective at catching a small set of spam, but with very low false positive > rates, than patterns which match only 1 or 2 words. > Have you tried modifying the generator so that it generates longer patterns > from the corpus? Ok, I am doing experiments for different length of patterns and will show you the recall/error results of each kind. > 2. We have poor support for decoding between character sets (e.g. converting > all > text strings in mails to UTF-8 where possible). Has this proved to be a > noticeable issue for this ruleset? (Just wondering!) Chinese_rules.cf is built to catch Chinese spam written in GB2312 code (simplified Chinese, mainly used in the Chinese mainland.) In future, if SpamAssassin converts all text strings in mails to UTF-8 before applying the ruleset, the current version of Chinese_rules.cf will not work. However, I can convert the ruleset to UTF-8 if neccessary. Best, Tran ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
