> More thought ... what if SA systems were to accumulate daily
> statistics, along the lines of one record for each rule, containing:
That sounds like the general sort of vague idea I had, fleshed out in more
detail.
Certainly the desirable goal is basically:
1 does this rule hit anything?
2 does it hit what it was supposed to hit?
3 does it look like a score adjustment might help, either up or down?
4 is this hitting something in a language that it wasn't intended to hit?
I think to do that we need basically annonomous information, with the exception
that we should know the primary site language(s) to help diagnose foreign
language problems.
In addition, I think the site should be able to optionally report a site
contact address if they want to. This could be useful if the stats indicate
that they have a seemingly local rule that is doing really well. There would
be someone that we could write and ask if they would be willing to contribute
it to the regular rules.
Another thing that would be nice to get from sites would be rule overlap
information. I'm not sure how to accumulate this with any efficiency, nor how
to report it compactly. But with a good idea of rules hitting in the spam/ham
categories, and a decent indication of rule overlap, it should be possible to
generate theoretical scoring profiles that would work perhaps better than the
default.
Loren