http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5686





------- Additional Comments From [EMAIL PROTECTED]  2007-10-20 07:19 -------
ok, some more tests....

trying the (crazy) K3=20 with the Bayes chain rule combiner:
SCORE  NUMHIT   DETAIL     OVERALL HISTOGRAM  (. = ham, # = spam)
0.000 (100.000%) 
..........|.......................................................
0.000 ( 1.047%) ##########|#
0.200 ( 0.110%) #         |
0.320 ( 0.055%) #         |
0.680 ( 0.055%) #         |
0.800 ( 0.055%) #         |
0.840 ( 0.055%) #         |
0.880 ( 0.165%) ##        |
0.920 ( 0.110%) #         |
0.960 (98.346%) 
##########|#######################################################

let's try the naive Bayes combiner, K3 = 0.8:

SCORE  NUMHIT   DETAIL     OVERALL HISTOGRAM  (. = ham, # = spam)
0.120 (32.439%) 
..........|......................................................
0.160 (32.844%) 
..........|.......................................................
0.160 ( 0.055%) #         |
0.200 (27.379%) ..........|..............................................
0.240 ( 6.275%) ..........|...........
0.280 ( 0.658%) ..........|.
0.280 ( 0.331%) ######    |
0.320 ( 0.152%) .....     |
0.320 ( 0.221%) ####      |
0.360 ( 0.051%) ..        |
0.400 ( 0.202%) .......   |
0.400 ( 0.110%) ##        |
0.440 ( 0.331%) ######    |
0.480 ( 0.496%) ######### |
0.520 ( 0.331%) ######    |
0.560 ( 1.378%) ##########|#
0.600 (15.160%) ##########|##############
0.640 (57.938%) 
##########|#######################################################
0.680 ( 2.426%) ##########|##
0.720 (20.066%) ##########|###################
0.760 ( 1.047%) ##########|#
0.840 ( 0.110%) ##        |


So far I think K3=1, with the traditional naive Bayes combiner, is
working best for us, since it's so good at avoiding FPs and FNs
that the others leave behind. 

To compare with the figures from comment 1, here's the results from a full
10-fold cross validation:

SCORE  NUMHIT   DETAIL     OVERALL HISTOGRAM  (. = ham, # = spam)
0.000 (25.415%) 
..........|.......................................................
0.040 ( 9.831%) ..........|.....................
0.080 (22.571%) ..........|.................................................
0.120 (21.716%) ..........|...............................................
0.160 ( 8.435%) ..........|..................
0.200 ( 5.444%) ..........|............
0.200 ( 0.028%) #         |
0.240 ( 3.916%) ..........|........
0.240 ( 0.022%) #         |
0.280 ( 1.801%) ..........|....
0.280 ( 0.022%) #         |
0.320 ( 0.491%) ..........|.
0.320 ( 0.226%) #####     |
0.360 ( 0.116%) .....     |
0.360 ( 0.231%) ######    |
0.400 ( 0.040%) ..        |
0.400 ( 0.193%) #####     |
0.440 ( 0.132%) ###       |
0.480 ( 0.223%) ..........|
0.480 ( 1.334%) ##########|##
0.520 ( 0.110%) ###       |
0.560 ( 0.419%) ##########|#
0.600 ( 0.832%) ##########|#
0.640 ( 1.769%) ##########|##
0.680 ( 8.813%) ##########|###########
0.720 (36.767%) ##########|############################################
0.760 (45.712%) 
##########|#######################################################
0.800 ( 3.279%) ##########|####
0.840 ( 0.006%)           |
0.880 ( 0.011%)           |
0.920 ( 0.022%) #         |
0.960 ( 0.072%) ##        |

Threshold optimization for hamcutoff=0.30, spamcutoff=0.70: cost=$206.30
Total ham:spam:   19764:18144
FP:     0 0.000%    FN:     9 0.050%
Unsure:  1973 5.205%     (ham:   528 2.672%    spam:  1445 7.964%)
TCRs:              l=1 12.479    l=5 12.479    l=9 12.479
SUMMARY: 0.30/0.70  fp     0 fn     9 uh   528 us  1445    c 206.30





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to