> More thought ... what if SA systems were to accumulate daily
> statistics, along the lines of one record for each rule, containing:

That sounds like the general sort of vague idea I had, fleshed out in more 
detail.
Certainly the desirable goal is basically:

1 does this rule hit anything?
2 does it hit what it was supposed to hit?
3 does it look like a score adjustment might help, either up or down?
4 is this hitting something in a language that it wasn't intended to hit?

I think to do that we need basically annonomous information, with the exception 
that we should know the primary site language(s) to help diagnose foreign 
language problems.  

In addition, I think the site should be able to optionally report a site 
contact address if they want to.  This could be useful if the stats indicate 
that they have a seemingly local rule that is doing really well.  There would 
be someone that we could write and ask if they would be willing to contribute 
it to the regular rules.

Another thing that would be nice to get from sites would be rule overlap 
information.  I'm not sure how to accumulate this with any efficiency, nor how 
to report it compactly.  But with a good idea of rules hitting in the spam/ham 
categories, and a decent indication of rule overlap, it should be possible to 
generate theoretical scoring profiles that would work perhaps better than the 
default.

             Loren

Reply via email to