https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155
--- Comment #10 from Justin Mason <[email protected]> 2009-08-18 01:15:46 PST --- (In reply to comment #9) > http://ruleqa.spamassassin.org/20090817-r804903-n/TVD_SPACE_RATIO/detail > 90% FP rate for Japanese > http://ruleqa.spamassassin.org/20090817-r804903-n/PLING_QUERY/detail > 52% FP rate for Japanese > http://ruleqa.spamassassin.org/20090817-r804903-n/GAPPY_SUBJECT/detail > 44% FP rate for Japanese > > All three of these rules do very poorly with Japanese mail, and the total % > SPAM is lower than the % FP. Yet the GA scores are rather high since we don't > have a statistically significant amount of Japanese mail in the corpus. > > What language are the SPAM hits? Perhaps many are examples of identifying > foreign languages instead of determining if it is ham or spam? > > Bug #6149 is related to this problem. I plan to fix that, alright. > I am attempting to convince Japanese, Chinese and Korean users to join the > nightly masscheck, but it is very difficult. BTW, you could also take copies of their mail samples and add them to your own corpora, in effect acting as a proxy for them. that's easier for them than setting up all the infrastructure. (I thought you were already doing this ;) You may need to be able to ask them if a mail _really_ is ham, down the line, though, so it needs to remain a two-way arrangement. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
