On Mon, Jun 27, 2005 at 03:54:35PM +0100, Nix wrote: > > run with --learn=N -- we're going to want to figure out N > > small # for large # of messages, large # for small # of messages? > > That sounds like an optimization problem to me (find that percentage > which yields the greatest accuracy when tested against an entirely > unrelated corpus).
Well, it's more about finding an N that simulates real-world behavior. We
don't want to find the N that gives the best results unless the same N is what
the average user does.
> ... ah, I see, and this gives you Bayes-plus-net results, from which you
> can determine the other results by just filtering certain rules out of
> the mass-check results. Neat.
Yeah, previous mass-check runs required 3 because we let auto-learn do its
thing and that required scores to be set, and bayes depended on net rules,
etc, etc.
We're now going to simulate manual learning instead of autolearning, which
means we can just do 1 run and generate all the results from there.
--
Randomly Generated Tagline:
Hee, hee! I can be a jerk and no one can stop me!
-- Homer Simpson
Itchy & Scratchy Land
pgpiJh7i2fEFn.pgp
Description: PGP signature
