http://bugzilla.spamassassin.org/show_bug.cgi?id=4505
------- Additional Comments From [EMAIL PROTECTED] 2005-08-06 15:56 ------- OK, I got hold of the logs from Henry, and measured some BAYES scores against the validation set: base results from comment 28, gen-set3-2.0-5.0-100-nobob: # Correctly non-spam: 53070 99.96% # Correctly spam: 121906 98.49% # False positives: 21 0.04% # False negatives: 1872 1.51% # TCR(l=50): 42.360712 SpamRecall: 98.488% SpamPrec: 99.983% copying values from set 2 for set 3: # Correctly non-spam: 53064 99.95% # Correctly spam: 122453 98.93% # False positives: 27 0.05% # False negatives: 1325 1.07% # TCR(l=50): 46.272150 SpamRecall: 98.930% SpamPrec: 99.978% comment 14: # Correctly non-spam: 53014 99.85% # Correctly spam: 123093 99.45% # False positives: 77 0.15% # False negatives: 685 0.55% # TCR(l=50): 27.293936 SpamRecall: 99.447% SpamPrec: 99.937% comment 42 (the patch in attachment 3051): # Correctly non-spam: 53068 99.96% # Correctly spam: 122509 98.97% # False positives: 23 0.04% # False negatives: 1269 1.03% # TCR(l=50): 51.169078 SpamRecall: 98.975% SpamPrec: 99.981% I think 3051 has the best scores. less FNs, just 2 more FPs, sane scores. I'd suggest we just vote on that patch. If you want to try other values btw -- the logs are in the zone. do this: cd svncheckout/masses rm ham.log spam.log ln -s /home/corpus-rsync/corpus/scoregen-3.1/gen-set3-2.0-5.0-100-nobob/NSBASE/ham-test.log ham.log ln -s /home/corpus-rsync/corpus/scoregen-3.1/gen-set3-2.0-5.0-100-nobob/SPBASE/spam-test.log spam.log vi ../rules/50_scores.cf ./fp-fn-statistics --scoreset=3 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
