http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376





------- Additional Comments From [EMAIL PROTECTED]  2007-07-04 00:25 -------
Unfortunately it's been a while since I've looked at this stuff. (Actually, it's
been like 3 months... which is hardly a while, but it's been a busy 3 months...)

In no particular order:

- BAYES_* are marked as immutable right now (IIRC). This really limits
optimization in score sets 2 and 3.

- Score ranges need to be better defined. (Perhaps require that entries fit in
current score ranges?) If we don't clearly define/restrict score range, the best
submission will probably be the one with the least restricted scores. Score
ranges prevent scores from being over-optimized to our data set. Splitting our
data set into training and test sets doesn't really catch this
over-optimization, since both are part of our data set that has unique
characteristics. (I'm sure there are technical terms for this, I just don't
remember what they are...)

- I already have a copy of the test set (if I can find it). Does that make me
ineligible? :-)

- By requiring scores in the current format, we are eliminating a whole class of
scoring systems. For example, suppose I wanted to try a decision tree system to
detect spam based on SpamAssassin rules (this would obviously work very poorly),
it would be impossible to convert this into a set of scores.
  - The LR experiments Steve and I did relied on a logarithmic decision rule
(i.e. a message is spam if 1 / 1 + exp^(-(scores * rule_hits)) > probability
threshold). This is easy to convert into traditional SpamAssassin scores using
algebra, but other systems may not be.
  - If we scrap the requirements for output to be in terms of current
SpamAssassin scores, our score ranges problem becomes more significant -- score
ranges don't mean anything if we're not talking about traditional SpamAssassin
scores.
  - Ask me if this isn't clear -- it's tricky to explain.

- Our evaluation criteria is currently undefined. We need a clear, single
measurement to decide on a winner. (In our research, we used TCR on the test set
with lambda = 50 as our "goal" criteria.) Depending on how/if we resolve the
previous point, we need to set a threshold value (for example 5.0) as our sole
test point.

- Do you think people are actually going to be interested in this enough in
order to devote a good chunk of time toward it? I hope so...

Makes me think I should have submitted a talk to ApacheCon... it'd be a great
way to kick off this contest.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to