http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270
------- Additional Comments From [EMAIL PROTECTED] 2007-02-22 06:55 -------
> annoyingly, the basic FP/FN rate when I run fp-fn-statistics *now* for set 3
> is
> 0.1% higher in FNs than when I ran it during the score generation :(
>
> need to figure out wtf is up there before I can twiddle the IADB and DNSWL
> scores.
aha, I think I have it. there are certain rules like FM_MORTGAGE6PLUS that hit
enough mail in the nightly-mass-check to be promoted, but didn't hit any (recent
enough?) mail in the rescoring check to get into the perceptron input:
masses/gen-set0-2.0-5.0-100-ga/freqs: 0.000 0.0003 0.0000 1.000 0.51
0.00 FM_MORTGAGE6PLUS
masses/gen-set0-2.0-5.0-100-ga/make.output:rule FM_MORTGAGE6PLUS: immutable and
zero due to low hitrate
masses/gen-set0-2.0-5.0-100-ga/make.output:ignoring 'FM_MORTGAGE6PLUS': score
and range == 0
so this then got ignored by the perceptron. however, the hits are still in the
logs, and the rule is still in 72_active.cf. after the perceptron completes,
rewrite-scores does not add a line to 50_scores.cf for this rule (because it's
not in the scores file output by perceptron).
when fp-fn-statistics is run, later, parse-rules-for-masses is run in turn,
generating a rules.pl file containing:
'FM_MORTGAGE6PLUS' => {
'lang' => '',
'score' => '1',
'describe' => 'Looks like a mortgage spam
(6+)',
'tflags' => '',
'type' => 'meta',
'issubrule' => '0',
'mutable' => 1,
'eval' => '0',
'depends' => [
'__FM_MORTGAGE6PLUS'
],
'code' => '(__FM_MORTGAGE6PLUS)'
},
note: a score of 1! this is because the rule exists in 72_active.cf, but has no
score in 50_scores.cf. fp-fn-statistics then uses that to compute its accuracy
rates.
this then accounts for the difference, I'd say; a few rules like that with
0.0003% hitrates, and scores changing from 0.0 to 1.0, could add up to ~0.1% FN
improvement and ~0.01% additional FPs.
to fix: we need to indicate that these rules were immutable and zeroed, so that
rewrite-scores will add a score of 0 for them to 50_scores.cf after the
perceptron is run.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.