http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
------- Additional Comments From [EMAIL PROTECTED] 2007-07-04 00:25 ------- Unfortunately it's been a while since I've looked at this stuff. (Actually, it's been like 3 months... which is hardly a while, but it's been a busy 3 months...) In no particular order: - BAYES_* are marked as immutable right now (IIRC). This really limits optimization in score sets 2 and 3. - Score ranges need to be better defined. (Perhaps require that entries fit in current score ranges?) If we don't clearly define/restrict score range, the best submission will probably be the one with the least restricted scores. Score ranges prevent scores from being over-optimized to our data set. Splitting our data set into training and test sets doesn't really catch this over-optimization, since both are part of our data set that has unique characteristics. (I'm sure there are technical terms for this, I just don't remember what they are...) - I already have a copy of the test set (if I can find it). Does that make me ineligible? :-) - By requiring scores in the current format, we are eliminating a whole class of scoring systems. For example, suppose I wanted to try a decision tree system to detect spam based on SpamAssassin rules (this would obviously work very poorly), it would be impossible to convert this into a set of scores. - The LR experiments Steve and I did relied on a logarithmic decision rule (i.e. a message is spam if 1 / 1 + exp^(-(scores * rule_hits)) > probability threshold). This is easy to convert into traditional SpamAssassin scores using algebra, but other systems may not be. - If we scrap the requirements for output to be in terms of current SpamAssassin scores, our score ranges problem becomes more significant -- score ranges don't mean anything if we're not talking about traditional SpamAssassin scores. - Ask me if this isn't clear -- it's tricky to explain. - Our evaluation criteria is currently undefined. We need a clear, single measurement to decide on a winner. (In our research, we used TCR on the test set with lambda = 50 as our "goal" criteria.) Depending on how/if we resolve the previous point, we need to set a threshold value (for example 5.0) as our sole test point. - Do you think people are actually going to be interested in this enough in order to devote a good chunk of time toward it? I hope so... Makes me think I should have submitted a talk to ApacheCon... it'd be a great way to kick off this contest. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
