http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5686





------- Additional Comments From [EMAIL PROTECTED]  2007-11-01 16:22 -------
(In reply to comment #18)
> I'm rerunning now to establish another fix to that bug, that still displays an
> improvement, since the attempt in r588709 doesn't do that...

this proved really tricky.

after a full 10-fold cv run, here's what the original (buggy) code scores,
for two sample score thresholds:

SUMMARY: 0.30/0.70  fp     0 fn     9 uh   528 us  1445    c 206.30
SUMMARY: 0.20/0.80  fp     0 fn     0 uh  2378 us 17529    c 1990.70


it took a few days, but I've finally figured out a patch that is both
(a) not buggy ;) and (b) has better results:

SUMMARY: 0.30/0.70  fp     0 fn     7 uh   994 us   631    c 169.50
SUMMARY: 0.20/0.80  fp     0 fn     0 uh  3018 us  3295    c 631.30

It includes a small hack -- it scales the scores up by 10%, since EDDC and the
naive Bayes combiner seem to skew scores a little lower.  results improve with
this; it'd probably be better to analyze the EDDC equation and figure out why
the scores aren't 10% higher to start with, but hey ;)

This is now the new baseline, checked in as r591167.


Here's the score histogram:

SCORE  NUMHIT   DETAIL     OVERALL HISTOGRAM  (. = ham, # = spam)
0.000 (25.086%) 
..........|.......................................................
0.040 ( 9.016%) ..........|....................
0.080 (16.146%) ..........|...................................
0.120 (23.593%) ..........|....................................................
0.160 (10.888%) ..........|........................
0.200 ( 5.976%) ..........|.............
0.200 ( 0.011%)           |
0.240 ( 4.265%) ..........|.........
0.240 ( 0.028%) #         |
0.280 ( 2.970%) ..........|.......
0.280 ( 0.011%)           |
0.320 ( 1.295%) ..........|...
0.320 ( 0.039%) #         |
0.360 ( 0.390%) ..........|.
0.360 ( 0.220%) ######    |
0.400 ( 0.106%) .....     |
0.400 ( 0.209%) ######    |
0.440 ( 0.040%) ..        |
0.440 ( 0.165%) #####     |
0.480 ( 0.121%) ###       |
0.520 ( 0.228%) ..........|
0.520 ( 1.361%) ##########|##
0.560 ( 0.072%) ##        |
0.600 ( 0.259%) #######   |
0.640 ( 0.612%) ##########|#
0.680 ( 0.970%) ##########|#
0.720 ( 2.750%) ##########|####
0.760 (11.332%) ##########|################
0.800 (38.261%) ##########|#####################################################
0.840 (40.074%) 
##########|#######################################################
0.880 ( 3.390%) ##########|#####
0.920 ( 0.011%)           |
0.960 ( 0.105%) ###       |




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to