Hi Duncan, On Wed, Jul 20, 2005 at 12:41:48AM -0400, Duncan Findlay wrote: > > I think the first point is the bigger one. Ultimately, Dan's sandbox > proposal may solve part of the "not enough rules" problem by making it > easier for people to contribute rules. But I'd like to hear from > potential rule submitters -- would this be a step in the right > direction? Is this something that you would be on board with? Would > you be more inclined to contribute rules?
Maybe a bit off-topic, on the other hand... see blow. 1) What I miss most is a transparent dataset about every rule. I'd like to know - percentage of false positives - percentage of flase negatives - percentage of true positives - percentage of true negatives - number of mails checked for the results above - standard deviation of the percentages obove This numbers should be available for masses in different regions and languages, i.e. Europe/English, Europe/German since there are big differences in the effectivity of rules. 2) Detection of redundancy or linear independency. Is my new rule covered or disabled by another rule or does it affect existing rules? This could be detected which a MassCheck. 3) As Loren said before, new rules becomes unuseful when posted on the list. If you implement 1), this could give a strong feedback and motivation to the rule distributors. If you collect the statistics automatically from registrated (trustable) servers, you not even would have to make your own mass checks! Benefit for the user: Very fast feedback about which rules are actually useful. About 2): I sometimes wonder if my rules are really useful. This could be an indicator. Since I don't want to commit unusefull rules this may help, even if it's only a small point. Top 3) is a very problematic one. The only way (keeping the source open) I can see, is to react very fast, very flexible and very individual. This is a "goto 1)". If I have a big pool of rules where I myself can decide which one to take and which not - based on real facts, not on guessing - this would be a great improvement. My idea about this is to send a FN to a reference server, see which (even very new und little tested) rules matches, look at the statistics and decide to include it or not - or - if no rule matches, to provide one. For each rule a set of matching spam-mails should be stored by the author to cross check other rules for linear dependencies. Sadly the actual used model of scoring is not helpful for this approach :( It would be much better to have a real statistical scoring where I just could multiply the probabilities of each used rule to get a result. This result would tell me: This is 99% Spam and the probability of beeing false is is 0.3%, based on the mass europe/german. The statistical scoring could be calculated directly and fast from the feedback in 1) and/or with a MC and - don't underestimate this - this approach would make it *much* easier and more accurate to include external modules like NiX-Spam: http://www.heise.de/ix/nixspam/ http://www.bonengel.de/index.php?id=7 Even the Bayes-Classifier would be much easier to score and you'll no longer need 4 different scorings for 'w/ and w/o bayes', 'w/ and w/o network'. Be aware you'll double the number of scorings with each new class of tests you implement in the actual scoring model! I know the proposed change in scoring would be a really big step but I think it's absolutely necessary to be prepared for flexible and fast future developement. -- Regards Frank
