Hello Nix,

Saturday, July 2, 2005, 2:46:52 AM, you wrote:

N> This is far more elaborate than needed, I think. Limiting the age of
N> your spam corpus (which I do anyway) and using mass-check normally will
N> do the trick, as mass-check runs through mails in temporal order.  The
N> only `error' will be that ham of age [now - a couple of years] will
N> cohabit in the Bayes DB with spam of age [now - six months]. If this
N> caused a problem Bayes would be nearly useless anyway :)

Except, doing it this simple way (which is how I do normal, non-bayes
mass-checks), means that you'd load (autolearn) a year's worth of ham
into your Bayes database before giving it the first spam. Your Bayes
database will be out of balance until it has learned a significant
number of spam or
N> If expiry runs it ditches the ancient email first in any case.
until the first significant expiry gets rid of much of that older ham.

N> I think I'll do a few local perceptron runs with mass-checks with
N> different --limits after the rescoring mass-check is completed, and
N> see just what effect varying the limit on ham actually has. I'm
N> blithering in the absence of data right now.

Good idea. I'm interested to know what you find.

Bob Menschel



Reply via email to