Saturday, July 23, 2005, 8:36:58 PM, Duncan wrote: DF> * We discussed at length the ideas for the new rules project, and we DF> came up with some ideas, which we're trying to track DF> http://wiki.apache.org/spamassassin/RulesProjectPlan (Please give us DF> your feedback)
http://wiki.apache.org/spamassassin/RulesProjStreamlining One item not mentioned on this page yet is how to score rules going to either core and rapid distribution such as via sa-update or going to the extra rule sets. The ideal would be to find some way to incorporate new rules into a GA/Perceptron-line mechanism, perhaps a Perceptron run which a) assumes whatever hit frequency applied to the last full scoring run, b) freezes all scores in all score sets according to the most recent distribution, and then c) incorporates an sa-update scoring run and calculates appropriate scores for the new rules. If that's not practical, then perhaps we can use some standardized algorithms to determine provisional scores. The algorithms we use for general purpose rules within SARE seem to work very well, adding significantly to spam scores without causing any significant number of FPs. Would it be appropriate for me to post those algorithms in the wiki as part of a "scoring" discussion? I'm thinking this could easily grow to warrant a page of its own... Bob Menschel
