http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
------- Additional Comments From [EMAIL PROTECTED] 2007-08-14 06:03 ------- here's a version of lam() with a lambda calculation... #!/usr/bin/perl # http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376#c16 my ($lambda, $fppc, $fnpc, $nspam, $nham) = @ARGV; my $fprate = ((($fppc * $nham) / 100) + 0.5) / ($nham + 0.5); my $fnrate = ((($fnpc * $nspam) / 100) + 0.5) / ($nspam + 0.5); sub logit { my $p = shift; return log($p / (1-$p)); } sub invlogit { my $x = shift; return exp($x) / (1 + exp($x)); } my $llam = invlogit (($lambda * logit($fprate) + logit($fnrate)) / ($lambda + 1)); print "Llam(l=$lambda, fp=$fppc, fn=$fnpc, ns=$nspam, nh=$nham): $llam\n"; some results: Llam(l=10, fp=1, fn=5, ns=10000, nh=10000): 0.011653428823489 Llam(l=10, fp=2, fn=5, ns=10000, nh=10000): 0.0218100293576761 Llam(l=10, fp=5, fn=1, ns=10000, nh=10000): 0.0433911604792563 so it avoids the problem with original lam(). however: Llam(l=10, fp=1.5, fn=5, ns=10000, nh=10000): 0.0168115579465237 Llam(l=10, fp=1.0, fn=20, ns=10000, nh=10000): 0.0134020941849576 I think this is a problem. IMO a 20% FN rate/1.0% FP rate should not score better than 5% FN/1.5% FP. it's good drop of the FP rate, sure -- but a filter with 20% FNs is unusable. :( so: TCR: varies widely based on size of corpora F(): good FP rates can mask terrible FN rates lam(): treats FPs and FNs as equal, no concept of lambda Llam(): again, good FP rates can mask terrible FN rates we still don't have a good single-figure metric imo. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
