http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5686
------- Additional Comments From [EMAIL PROTECTED] 2007-10-17 15:03 ------- OK, I've now implemented osbf-lua-style OSBF, with EDDC (Exponential Differential Document Count), as r584760. (Note that r584432 described above wasn't OSBF -- it was just OSB ;) The test took too long. ;) I interrupted it after 5 of the 10 folds; this histogram is about representative: SCORE NUMHIT DETAIL OVERALL HISTOGRAM (. = ham, # = spam) 0.000 ( 5.820%) ..........|.......... 0.040 ( 6.225%) ..........|.......... 0.080 ( 3.998%) ..........|....... 0.120 (20.749%) ..........|.................................. 0.160 (33.198%) ..........|....................................................... 0.200 (18.370%) ..........|.............................. 0.200 ( 0.055%) # | 0.240 ( 9.565%) ..........|................ 0.280 ( 1.721%) ..........|... 0.280 ( 0.331%) ##### | 0.320 ( 0.101%) ... | 0.320 ( 0.110%) ## | 0.360 ( 0.101%) ... | 0.360 ( 0.331%) ##### | 0.400 ( 0.110%) ## | 0.440 ( 0.496%) ####### | 0.480 ( 0.152%) ..... | 0.480 ( 1.929%) ##########|# 0.520 ( 1.103%) ##########|# 0.560 ( 1.213%) ##########|# 0.600 ( 1.323%) ##########|# 0.640 ( 0.717%) ##########|# 0.680 ( 8.434%) ##########|###### 0.720 (77.233%) ##########|####################################################### 0.760 ( 6.615%) ##########|##### Note that the fundamental shape has changed, since OSBF uses a traditional naive Bayesian combiner, instead of the binary Winnow style, or the nearly-binary Robinsonian chi-square combiner. OSBF however is behind the impressively low number of FPs and FNs, I think, though! Here's the numbers: Threshold optimization for hamcutoff=0.30, spamcutoff=0.70: cost=$21.00 Total ham:spam: 1976:1814 FP: 0 0.000% FN: 1 0.055% Unsure: 200 5.277% (ham: 41 2.075% spam: 159 8.765%) TCRs: l=1 11.338 l=5 11.338 l=9 11.338 SUMMARY: 0.30/0.70 fp 0 fn 1 uh 41 us 159 c 21.00 I think I need to keep working on this... ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
