-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frank Heydlauf writes:
> Hi Duncan,
> 
> On Wed, Jul 20, 2005 at 12:41:48AM -0400, Duncan Findlay wrote:
> > 
> > I think the first point is the bigger one. Ultimately, Dan's sandbox
> > proposal may solve part of the "not enough rules" problem by making it
> > easier for people to contribute rules. But I'd like to hear from
> > potential rule submitters -- would this be a step in the right
> > direction? Is this something that you would be on board with? Would
> > you be more inclined to contribute rules?
> 
> Maybe a bit off-topic, on the other hand... see blow.
> 
> 1)
> What I miss most is a transparent dataset about every rule.
> I'd like to know
> - percentage of false positives
> - percentage of flase negatives
> - percentage of true positives
> - percentage of true negatives
> - number of mails checked for the results above
> - standard deviation of the percentages obove
> 
> This numbers should be available for masses in 
> different regions and languages, i.e. Europe/English,
> Europe/German since there are big differences
> in the effectivity of rules.

Yes, hit-frequencies and the current nightly-mass-check reporting does
this, including giving us visibility into 1, possibly 2 non-english
corpora.

> 2) 
> Detection of redundancy or linear independency.
> Is my new rule covered or disabled by another rule
> or does it affect existing rules?
> This could be detected which a MassCheck. 

hit-frequencies -o switch does this.  we had this enabled for automc,
and it'd be desirable to get it working on a new system.

> 3)
> As Loren said before, new rules becomes unuseful when
> posted on the list.

see reply to that mail.

> If you implement 1), this could give a strong
> feedback and motivation to the rule distributors.
> If you collect the statistics automatically from
> registrated (trustable) servers, you not even would
> have to make your own mass checks!
> Benefit for the user: Very fast feedback about which
> rules are actually useful.
> 
> About 2): I sometimes wonder if my rules are really
> useful. This could be an indicator. Since I don't 
> want to commit unusefull rules this may help, even
> if it's only a small point.
> 
> Top 3) is a very problematic one. 
> The only way (keeping the source open) I can see, is
> to react very fast, very flexible and very individual.
> 
> This is a "goto 1)". If I have a big pool of rules
> where I myself can decide which one to take and which
> not - based on real facts, not on guessing - this
> would be a great improvement.
> 
> My idea about this is to send a FN to a reference
> server, see which (even very new und little tested)
> rules matches, look at the statistics and decide
> to include it or not - or - if no rule matches, to 
> provide one.
> For each rule a set of matching spam-mails should
> be stored by the author to cross check other rules
> for linear dependencies.
> 
> Sadly the actual used model of scoring is not helpful
> for this approach :( It would be much better to have
> a real statistical scoring where I just could multiply
> the probabilities of each used rule to get a result.
> This result would tell me: This is 99% Spam and
> the probability of beeing false is is 0.3%, based
> on the mass europe/german.
> The statistical scoring could be calculated directly
> and fast from the feedback in 1) and/or with a MC
> and - don't underestimate this - this approach would
> make it *much* easier and more accurate to include
> external modules like NiX-Spam:
> http://www.heise.de/ix/nixspam/
> http://www.bonengel.de/index.php?id=7
> Even the Bayes-Classifier would be much easier to
> score and you'll no longer need 4 different scorings
> for 'w/ and w/o bayes', 'w/ and w/o network'.
> Be aware you'll double the number of scorings
> with each new class of tests you implement in 
> the actual scoring model!
> 
> I know the proposed change in scoring would be a
> really big step but I think it's absolutely necessary
> to be prepared for flexible and fast future
> developement.

FWIW, we tried that -- using rules' S/O ratios as bayesian probabilities.
the results were quite a bit less effective than the current additive
system, which was a little surprising; we surmised it was because we
don't have any rules that indicate if a mail is ham, so it only has
half of the data required to make accurate guesses.

If you go back (waaaay back to about dec 2002) in the dev archives
you may be able to find the discussion...

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFC3qQ8MJF5cimLx9ARAtQdAJsG/V951dVgyKeMs8rJr/uXUGDCOwCfXB91
lz8/y6iXL2tyI9dHIijvtOs=
=gtoS
-----END PGP SIGNATURE-----

Reply via email to