http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5686
------- Additional Comments From [EMAIL PROTECTED] 2008-01-18 02:26 ------- I had some off-list discussion with Fidelis about this... he suggests using ROCA% as a better error-rate measurement system: Fidelis Assis writes: > Justin Mason wrote: > > Fidelis Assis writes: > >> Justin Mason escreveu: > >>> Fidelis Assis writes: > >>>> Justin Mason escreveu: > >>>>> Fidelis Assis writes: > >> The other day I was in a discussion on the CRM114 list about error-rate > >> X ROCA% and I made an analogy to archers showing why I think it's > >> possibly better for spam filters. It might be interesting, at least as a > >> curiosity :-) > >> > >> http://sourceforge.net/mailarchive/forum.php?thread_name=200711271356.lARDujYL031322%40spoo.merl.com&forum_name=crm114-general > > > > Ah, that's a very good explanation. You might have convinced me, I think ;) > > If I get some time soon, I'll try re-examining those results using > > 1-ROCA%. also suggests changing the inputs to the combiner: > >>>>> from the EDDC equation is used as P(spam) values and fed into our naive > >>>>> Bayes combiner, producing a value ranging from 0.0 (nonspam) to 0.5 > >>>>> (unsure) to 1.0 (spam). > >>>> I don't use probabilities directly, but the ratio > >>>> 0.59*log10(p(ham)/p(spam)). OSBF probabilities are either very close to > >>>> 1 or to 0. > >>> hmm, I may try that. and tried out osbf-lua on my test corpus: > The filter learns better if the order of the messages is the original, or > random, instead of a batch of a class and then a batch of the other. A > modified script using random order is attached for your tests. he gets much better results: 'I did the tests removing the X-Spam-* headers and I got 0 FP and 12 FN but from the 12, 9 are exactly the same message: msg 33 in spam bucket.4 with Subject: "Congress Proposes Olympic Boycott" (is this spam?); another 2 are also the same message: msg 165 in spam bucket.2, with subject: "Notice of account temporary suspension" (paypal phishing). The last one is another paypal phishing, but with the same contents: msg 174 in spam bucket.6. If we don't count the same mistake repeatedly we have 0 FP and 3 FN, which is still very good considering that the filter was trained with only 422 msgs, and it reaches its max accuracy after 2-3k.' so the code I've got here is a way off osbf-lua's accuracy rates yet... ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
