http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5686





------- Additional Comments From [EMAIL PROTECTED]  2007-10-17 15:03 -------
OK, I've now implemented osbf-lua-style OSBF, with EDDC (Exponential
Differential Document Count), as r584760. (Note that r584432 described above
wasn't OSBF -- it was just OSB ;)

The test took too long. ;)  I interrupted it after 5 of the 10 folds;
this histogram is about representative:

SCORE  NUMHIT   DETAIL     OVERALL HISTOGRAM  (. = ham, # = spam)
0.000 ( 5.820%) ..........|..........
0.040 ( 6.225%) ..........|..........
0.080 ( 3.998%) ..........|.......
0.120 (20.749%) ..........|..................................
0.160 (33.198%) 
..........|.......................................................
0.200 (18.370%) ..........|..............................
0.200 ( 0.055%) #         |
0.240 ( 9.565%) ..........|................
0.280 ( 1.721%) ..........|...
0.280 ( 0.331%) #####     |
0.320 ( 0.101%) ...       |
0.320 ( 0.110%) ##        |
0.360 ( 0.101%) ...       |
0.360 ( 0.331%) #####     |
0.400 ( 0.110%) ##        |
0.440 ( 0.496%) #######   |
0.480 ( 0.152%) .....     |
0.480 ( 1.929%) ##########|#
0.520 ( 1.103%) ##########|#
0.560 ( 1.213%) ##########|#
0.600 ( 1.323%) ##########|#
0.640 ( 0.717%) ##########|#
0.680 ( 8.434%) ##########|######
0.720 (77.233%) 
##########|#######################################################
0.760 ( 6.615%) ##########|#####


Note that the fundamental shape has changed, since OSBF uses a traditional
naive Bayesian combiner, instead of the binary Winnow style, or the
nearly-binary Robinsonian chi-square combiner.  OSBF however is behind
the impressively low number of FPs and FNs, I think, though!

Here's the numbers:

Threshold optimization for hamcutoff=0.30, spamcutoff=0.70: cost=$21.00
Total ham:spam:   1976:1814
FP:     0 0.000%    FN:     1 0.055%
Unsure:   200 5.277%     (ham:    41 2.075%    spam:   159 8.765%)
TCRs:              l=1 11.338    l=5 11.338    l=9 11.338
SUMMARY: 0.30/0.70  fp     0 fn     1 uh    41 us   159    c 21.00

I think I need to keep working on this...




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to