> > So, we wouldn't be able to unlearn historical emails (processed > > before the upgrade),
Yes. > > but there'd be few other side-effects? Right, practically no other ill effects. The tokens format remains unchanged. > yes old learned mails cant be unlearned, even bayes_tokens is equal, > it cant track if it was spam or ham learned The spamminess is a property of a token, so you still know if a token is spammy or hammy even if there is no more trace of the original message in the 'seen' set. It is not uncommon to ditch the 'seen' table (if using SQL) every now and then when it grows large. Changing the canonicalization algorithm to compute a message digest as used in a 'seen' set is no different than ditching the 'seen' table, just wastes some storage until these entries expire. In my view there is hardly any reason to worry about the canonicalization algorithm change for computing message digest. The effect is the same as setting expiration time on the 'seen' entries very short (this is now configurable in a Redis backend). Unchanged tokens set ensures the Bayes classification remains as effective as before. Mark
