http://bugzilla.spamassassin.org/show_bug.cgi?id=4052





------- Additional Comments From [EMAIL PROTECTED]  2005-01-05 02:59 -------
> 1. In our experience, patterns which span 4 or more words, are often more
> effective at catching a small set of spam, but with very low false positive
> rates, than patterns which match only 1 or 2 words.

> Have you tried modifying the generator so that it generates longer patterns
> from the corpus?

Ok, I am doing experiments for different length of patterns and will show you 
the recall/error results of each kind.


> 2. We have poor support for decoding between character sets (e.g. converting 
> all
> text strings in mails to UTF-8 where possible).  Has this proved to be a
> noticeable issue for this ruleset? (Just wondering!)

Chinese_rules.cf is built to catch Chinese spam written in GB2312 code 
(simplified Chinese, mainly used in the Chinese mainland.)

In future, if SpamAssassin converts all text strings in mails to UTF-8 before 
applying the ruleset, the current version of Chinese_rules.cf will not work. 
However, I can convert the ruleset to UTF-8 if neccessary.

Best,
Tran





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to