https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155
--- Comment #134 from John Hardin <[email protected]> 2009-10-26 14:31:20 UTC --- (In reply to comment #132) > $ grep RCVD_IN_DNSWL_ freqs.full > OVERALL SPAM% HAM% S/O RANK SCORE NAME > 0.184 0.0005 0.5708 0.001 0.76 -1.80 RCVD_IN_DNSWL_HI > 7.410 0.1094 22.7527 0.005 0.67 -1.20 RCVD_IN_DNSWL_MED > 2.551 0.1810 7.5322 0.023 0.59 -1.10 RCVD_IN_DNSWL_LOW > > It is quite possible that some of these hits are still false positives, > despite several iterations of cleaning: > > for j in spam*.log; do echo -n $j; grep RCVD_IN_DNSWL_HI $j | \ > wc -l; done | sort -k2nr > > spam-bayes-net-bb-jhardin.log 3 > > same on _MED: > > spam-bayes-net-bb-jhardin.log 23 All but one of those are obvious spams, and I've removed the one questionable one from my corpora. Some of the spam in my corpora is from third parties. I do check it for correct classification before uploading, but I was wondering: how does masscheck determine the correct lastexternal for corpora containing messages from multiple different networks? Or does it assume all of the messages in a given contributor's corpora have the same network boundary? If the latter, I need to remove those third-party messages from my spam corpora... Might lastexternal confusion in the masschecks be contributing in some way to the odd RCVD_IN_* score generation? -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
