Duncan Findlay writes:
> On Wed, Jan 03, 2007 at 02:42:44PM +0000, Justin Mason wrote:
> 
> >   - T + 0 days: announce a heads-up mail. clean up our corpora, get ready
> >     for mass-checking, try out mass-check to spot any big memory leaks or
> >     whatnot, fix remaining bugs that affect mass-checks (esp bug 5260!),
> >     get people signed up, enable all rules in svn.
> 
> >   - T + 1 week, around a Thursday or so: start --bayes --net mass-checks;
> >     move to C-T-R.
> 
> >   - T + 3 weeks, a Monday or so: hopefully finish mass-checks, bugs
> >     allowing ;) (note that includes two weekends.)
> 
> >   - T + 3 weeks: perceptron runs, voting on new proposed scores, etc
> 
> >   - T + 4 weeks and a bit: hopefully ready to release
> 
> +1
> 
> BTW, how do we generate all 4 scoresets from one run? We used to have
> to do two runs, and I can't remember the rationale for that, or the
> rationale for doing it one. :-)

Well, I took a look back at the 3.1.0 score-generation to figure this out,
since I'd forgotten.   Here are the old instructions:
http://wiki.apache.org/spamassassin/RescoreDetails

Basically, we do a single set3 mass-check, with all scores unzeroed. This
uses "--bayes --learn=35", which uses Bayes and learns 35% of all mails in
whatever direction SpamAssassin classified them as (in other words, a
pretty simplistic auto-learn algorithm, with errors). I think the idea was
to simulate "real" Bayes auto-learning, which includes errors too.

from that, we can derive:

set-0: by removing all net and BAYES rules from the log

set-1: by removing all BAYES

set-2: by removing all net hits

set-3: what we did

The key appears to be the --learn=35 bit.

It's hard to recall the details -- we didn't note much of it down
I think, and it was 19 months ago :(

--j.

Reply via email to