https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155





--- Comment #9 from Warren Togami <[email protected]>  2009-08-17 20:46:17 PST 
---
http://ruleqa.spamassassin.org/20090817-r804903-n/TVD_SPACE_RATIO/detail
90% FP rate for Japanese
http://ruleqa.spamassassin.org/20090817-r804903-n/PLING_QUERY/detail
52% FP rate for Japanese
http://ruleqa.spamassassin.org/20090817-r804903-n/GAPPY_SUBJECT/detail
44% FP rate for Japanese

All three of these rules do very poorly with Japanese mail, and the total %
SPAM is lower than the % FP.  Yet the GA scores are rather high since we don't
have a statistically significant amount of Japanese mail in the corpus.

What language are the SPAM hits?  Perhaps many are examples of identifying
foreign languages instead of determining if it is ham or spam?

Bug #6149 is related to this problem.

I am attempting to convince Japanese, Chinese and Korean users to join the
nightly masscheck, but it is very difficult.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to