http://bugzilla.spamassassin.org/show_bug.cgi?id=4052





------- Additional Comments From [EMAIL PROTECTED]  2005-01-03 11:04 -------
Hi --

these look very interesting, and I like the methodology!  (I also notice that
the recall/error rates have improved from the figures quoted in the
presentations, according to the .cf file's comments; the current figures look
very useful!)

Would it be possible for you to sign and fax an Apache CLA so that we can
incorporate these (or at least test them)?  details are at:
http://www.apache.org/licenses/#clas

OK, a few questions:

1. In our experience, patterns which span 4 or more words, are often more
effective at catching a small set of spam, but with very low false positive
rates, than patterns which match only 1 or 2 words.

Have you tried modifying the generator so that it generates longer patterns from
the corpus?  It would increase memory use in the generator, but should generate
a smaller number of more-reliable rules that can supplement the shorter rules. 
This small set of long rules would then possibly warrant higher score values
than the larger set of short rules.

2. We have poor support for decoding between character sets (e.g. converting all
text strings in mails to UTF-8 where possible).  Has this proved to be a
noticeable issue for this ruleset? (Just wondering!)

3. Our default ruleset is not very good against Chinese mail in general,
apparently missing a lot of spam and causing false positives on ham messages.  
It would be *very* useful if we could set up nightly mass-checks against a good
Chinese-ham corpus, in order to avoid future FPs.   

There's two ways to do that -- either by one of the existing developers
obtaining a (confidential) copy of the corpus and adding that to their
collection if that's permissible, or if your group could set up a nightly
mass-check as described here: 
http://wiki.apache.org/spamassassin/NightlyMassCheck

If either would be possible, that would be really great ;)

--j.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to