by the way, I should mention where I'm thinking of going with this.

As you can see, if you compare
http://ruleqa.spamassassin.org/20071113-r594464-b (a preflight mass-check
of 4000 messages) vs http://ruleqa.spamassassin.org/20071113-r594456-n (a
nightly mass-check of 50000 messages), there are some major differences in
how accurate the rules are judged to be.

We can now complete a mass-check of 50k messages in 22 minutes, using
mass-check running on the zone, with 2 slaves (talon and infiltrator), the
corpora from 3 contributors uploaded to the zone, and distributed
mass-check.

If we add more servers, I'm hoping we can get to a stage where we can scan
the entire uploaded corpora *on every checkin*, thereby:

  - providing more accurate rule-QA data,
  - faster, within 30 minutes (which is faster than the current preflight
    mass-check),
  - and making the miniscule 4k-message "preflight" corpus obsolete

That would be cool ;)

There may even be a possibility of using some donated supercomputing
infrastructure to do this in the future.  Who knows how fast it'd be
then... ;)

--j.

Reply via email to