http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5711

           Summary: RFE: "mass-check --reuse-only" switch
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Masses
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


In order to be able to generate set1 scores nightly we need a way to run
'mass-check --net' much faster than it currently runs.  In discussions on dev@ 
[1],
we've decided that the best way to do this would be to add a switch,
"--reuse-only", which only produces network-rule output for messages where the
reused-lookups info is valid.

[1]: Subject: "Re: Nightly score generation for all scoresets", Fri 19 Oct 2007

As Daryl said:

> I'd settle for a --reuse-only 
> run that includes all of your messages for set0 results and only 
> reusable messages for set1 results... all done in a single mass-check.

So this would have to produce 4 output files:

  - ham.log / spam.log = set0 mass-check output, containing set0 mass-check
results for all messages

  - ham-set1.log / spam-set1.log ? = set1 mass-check output, containing only the
set1 results for messages where reuseable info was present?

maybe there's a better UI for that though... suggestions?


for what it's worth, here's the counts:

: exit=0 Wed Oct 31 18:00:04 GMT 2007; cd /home/corpus-rsync/corpus
: jm 72...; grep reuse=yes spam-net-*.log | wc -l
  489909
: jm 73...; grep reuse=no spam-net-*.log | wc -l
  105134
: jm 76...; grep reuse=yes ham-*.log | wc -l
   66868
: exit=0 Wed Oct 31 18:01:48 GMT 2007; cd /home/corpus-rsync/corpus
: jm 77...; grep reuse=no ham-*.log | wc -l
  253814

480k spams is pretty good, but 66k hams not so much.  We need to
improve that I'd say.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to