https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155

--- Comment #134 from John Hardin <[email protected]> 2009-10-26 14:31:20 UTC 
---
(In reply to comment #132)

> $ grep RCVD_IN_DNSWL_ freqs.full
> OVERALL    SPAM%     HAM%     S/O    RANK   SCORE  NAME
>   0.184   0.0005   0.5708    0.001   0.76   -1.80  RCVD_IN_DNSWL_HI
>   7.410   0.1094  22.7527    0.005   0.67   -1.20  RCVD_IN_DNSWL_MED
>   2.551   0.1810   7.5322    0.023   0.59   -1.10  RCVD_IN_DNSWL_LOW
> 
> It is quite possible that some of these hits are still false positives,
> despite several iterations of cleaning:
> 
> for j in spam*.log; do echo -n $j; grep RCVD_IN_DNSWL_HI $j | \
>   wc -l; done | sort -k2nr
> 
> spam-bayes-net-bb-jhardin.log         3
> 
> same on _MED:
> 
> spam-bayes-net-bb-jhardin.log      23

All but one of those are obvious spams, and I've removed the one questionable
one from my corpora.

Some of the spam in my corpora is from third parties. I do check it for correct
classification before uploading, but I was wondering: how does masscheck
determine the correct lastexternal for corpora containing messages from
multiple different networks? Or does it assume all of the messages in a given
contributor's corpora have the same network boundary? If the latter, I need to
remove those third-party messages from my spam corpora...

Might lastexternal confusion in the masschecks be contributing in some way to
the odd RCVD_IN_* score generation?

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to