http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5711
Summary: RFE: "mass-check --reuse-only" switch
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: enhancement
Priority: P5
Component: Masses
AssignedTo: [email protected]
ReportedBy: [EMAIL PROTECTED]
In order to be able to generate set1 scores nightly we need a way to run
'mass-check --net' much faster than it currently runs. In discussions on dev@
[1],
we've decided that the best way to do this would be to add a switch,
"--reuse-only", which only produces network-rule output for messages where the
reused-lookups info is valid.
[1]: Subject: "Re: Nightly score generation for all scoresets", Fri 19 Oct 2007
As Daryl said:
> I'd settle for a --reuse-only
> run that includes all of your messages for set0 results and only
> reusable messages for set1 results... all done in a single mass-check.
So this would have to produce 4 output files:
- ham.log / spam.log = set0 mass-check output, containing set0 mass-check
results for all messages
- ham-set1.log / spam-set1.log ? = set1 mass-check output, containing only the
set1 results for messages where reuseable info was present?
maybe there's a better UI for that though... suggestions?
for what it's worth, here's the counts:
: exit=0 Wed Oct 31 18:00:04 GMT 2007; cd /home/corpus-rsync/corpus
: jm 72...; grep reuse=yes spam-net-*.log | wc -l
489909
: jm 73...; grep reuse=no spam-net-*.log | wc -l
105134
: jm 76...; grep reuse=yes ham-*.log | wc -l
66868
: exit=0 Wed Oct 31 18:01:48 GMT 2007; cd /home/corpus-rsync/corpus
: jm 77...; grep reuse=no ham-*.log | wc -l
253814
480k spams is pretty good, but 66k hams not so much. We need to
improve that I'd say.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.