[Bug 3821] scores are overoptimized for training set

bugzilla-daemon 27 Sep 2004 21:58:26 -0000

http://bugzilla.spamassassin.org/show_bug.cgi?id=3821






------- Additional Comments From [EMAIL PROTECTED]  2004-09-27 14:58 -------
Subject: Re:  scores are overoptimized for training set

This sounds like a reasonable approach.  I can't help out with it at the 
moment, though.  My thesis needs to be finished in 7 weeks.

Henry

>------- Additional Comments From [EMAIL PROTECTED]  2004-09-27 14:16 -------
>'I like your idea concerning "harder to defeat" rules.  I'd also suggest a 
>classification of "more likely to be correct", which would include 
>- obfuscation rules with ultra high confidence ([EMAIL PROTECTED])
>- spam headers (X_MESSAGE_INFO)
>- known forgeries (FAKE_OUTBLAZE_RCVD)
>- broken ratware (subject =~ /%RAND/)
>
>Perhaps such rules can be flagged via tflags or similar mechanism, such that 
>the automatic scoring mechanism will apply preferential treatment to them, 
>provided that the scoring mass-checks hit no ham at all (or no spam if a 
>negative scoring rule).'
>
>BTW, this is the "rule reliability tflag" idea again; basically provide a way 
>to
>hint that this rule is reliable, and this rule should not be considered 
>reliable
>-- no matter what their hit-rates in mass-checks were. 
>
>I agree it may have good effects as a hint to the Perceptron, so it may now be
>time to do this.  what d'you think, Henry?
>
>
>
>------- You are receiving this mail because: -------
>You are the assignee for the bug, or are watching the assignee.
>  
>





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 3821] scores are overoptimized for training set

Reply via email to