http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270





------- Additional Comments From [EMAIL PROTECTED]  2007-02-22 06:55 -------
> annoyingly, the basic FP/FN rate when I run fp-fn-statistics *now* for set 3 
> is
> 0.1% higher in FNs than when I ran it during the score generation :(  
> 
> need to figure out wtf is up there before I can twiddle the IADB and DNSWL 
> scores.

aha, I think I have it.  there are certain rules like FM_MORTGAGE6PLUS that hit
enough mail in the nightly-mass-check to be promoted, but didn't hit any (recent
enough?) mail in the rescoring check to get into the perceptron input:

masses/gen-set0-2.0-5.0-100-ga/freqs:  0.000   0.0003   0.0000    1.000   0.51 
  0.00  FM_MORTGAGE6PLUS
masses/gen-set0-2.0-5.0-100-ga/make.output:rule FM_MORTGAGE6PLUS: immutable and
zero due to low hitrate
masses/gen-set0-2.0-5.0-100-ga/make.output:ignoring 'FM_MORTGAGE6PLUS': score
and range == 0

so this then got ignored by the perceptron.  however, the hits are still in the
logs, and the rule is still in 72_active.cf.  after the perceptron completes,
rewrite-scores does not add a line to 50_scores.cf for this rule (because it's
not in the scores file output by perceptron).

when fp-fn-statistics is run, later, parse-rules-for-masses is run in turn,
generating a rules.pl file containing:

           'FM_MORTGAGE6PLUS' => {
                                   'lang' => '',
                                   'score' => '1',
                                   'describe' => 'Looks like a mortgage spam 
(6+)',
                                   'tflags' => '',
                                   'type' => 'meta',
                                   'issubrule' => '0',
                                   'mutable' => 1,
                                   'eval' => '0',
                                   'depends' => [
                                                  '__FM_MORTGAGE6PLUS'
                                                ],
                                   'code' => '(__FM_MORTGAGE6PLUS)'
                                 },

note: a score of 1!  this is because the rule exists in 72_active.cf, but has no
score in 50_scores.cf.  fp-fn-statistics then uses that to compute its accuracy
rates.

this then accounts for the difference, I'd say; a few rules like that with
0.0003% hitrates, and scores changing from 0.0 to 1.0, could add up to ~0.1% FN
improvement and ~0.01% additional FPs.

to fix: we need to indicate that these rules were immutable and zeroed, so that
rewrite-scores will add a score of 0 for them to 50_scores.cf after the
perceptron is run.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to