On Fri, 01 Jul 2005, Justin Mason suggested tentatively:
> Nix writes:
>> On Thu, 30 Jun 2005, Theo Van Dinter spake:
>> > 18 months ago would be Jan 1 2004, not 2003. We also usually limit to
>> > 6 months, not 18, but ...
>>
>> Six months isn't much for ham at all, is it? That would only give me a
>> thousand or so hams, and more than a hundred times as much spam as ham.
>>
>> This seems a little... unbalanced. Ham doesn't change *that* fast.
>>
>> (Maybe I should suck a few mailing lists into the ham, but I'm chary of
>> that because many of those lists may also be being used by others as
>> ham sources, so it may lead to duplication.)
>
> Yeah, I'm not sure we had such a stringent limit on *ham*. Spam of course
> is different, but iirc old ham isn't such a big problem.
It's just that --after, IIRC, applies to ham as well as spam.
(I should graph hit frequencies against time for my ham and see how much
it *does* change over time. I suspect the changes aren't especially
large, especially now that most of our body rules hit spam, not ham ---
the biggest reason for a smaller --limit is that it avoids a situation
where Bayes artifically has tokens from mails very widely separated in
time, and I'm not sure how significant an effect *that* is, either. I
rather doubt the distribution of hammy Bayes tokens over time changes
much.
All IMHO, of course. I am but an egg.)
--
`I lost interest in "blade servers" when I found they didn't throw knives
at people who weren't supposed to be in your machine room.'
--- Anthony de Boer