[Bug 5376] RFE: generate a "SpamAssassin Challenge" score-generation test

bugzilla-daemon Tue, 07 Aug 2007 12:51:38 -0700

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376






------- Additional Comments From [EMAIL PROTECTED]  2007-08-07 12:51 -------
Btw, the CEAS 2007 contest used the following metrics:

  Filters will be evaluated using the lam() metric. Lam() calculates the
  average of a filter's false-negative and false-positive rates, but performs
  the calculation in logit space. The exact formula used for the competition
  will be: 
    FPrate = (#ham-errors  + 0.5) / (#ham  + 0.5)
    FNrate = (#spam-errors + 0.5) / (#spam + 0.5)
    lam = invlogit((logit(FPrate) + logit(FNrate)) / 2)
  where 
    logit(p) = log(p/(1-p))
    invlogit(x) = (exp(x))/(1 + exp(x))
 The winner will be the filter with the lowest lam() score.


Here is an additional comment from Gordon V. Cormack
(on the [EMAIL PROTECTED] ML):


Lam is equivalent to diagnostic odds ratio which is
used extensively in the diagnostic test literature.

Intuitively, it rewards a fixed factor improvement
in either spam or ham missclassification rate by
the same amount.  So improving from .001 to .0009
fp gives the same score gain as from .01 to .009 fn.

Lam is *almost* the same thing as the geometric
mean of the two scores, which is perhaps more
intuitive for most.

Lam is discussed in the TREC 2005 proceedings
  http://plg.uwaterloo.ca/~gvcormac/trecspamtrack05/

and DOR in the medical literature
  http://www.bmj.com/cgi/content/full/323/7305/157

In a nutshell, we are looking for a threshold-
independent measure that roughly captures the
overall discriminative ability of a filter.  It
turns out that lam works pretty well for this.
Better than, say "accuracy" which is wildly
threshold dependent; optimizing accuracy may
be totally inconsistent with good filtering.
While lam does not explicitly reward low fp
more than low fn, it does not actively
discourage it.

Other measures -- like ROC Area Under the
Curve -- could have been used but that would
have required filters to return scores rather
than categorical ham/spam judgements and we
considered this logistically unrealistic.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5376] RFE: generate a "SpamAssassin Challenge" score-generation test

Reply via email to