Bill, This is a pretty god point, but it diverges from the issue as far as Chris Means' mailet is concerned because a shared corpus is actually a very good starting point for training a system , and more effective than starting from scratch.
It is pretty easy to alter the behaviour by forwarding good or bad mail to the appropriate addresses. I carried out a survey of our users who have been training a shared corpus affecting (tagging not filtering) mailing lists and individual accounts, they were very pleased with overall performance and voted to continue to use the system. They were less happy with the effort involved in training, but accepted it as it was obviously (subjectively) effective in altering the behaviour of the tagger. > if you are interested we can discuss this in more detail > off-list, No, discuss it here, so we can all hear it. > but my experience is that cooperative work on > determining what terms. phrases, patterns, etc. are used to catch > specific material are generally more useful than the sharing of > mail that has been identified by cooperative efforts as spam. I believe that this is probably true, but as the Bayesian system can re-create its "patterns" from a collection of mail there is little real difference between sharing the token probabilities and the source material. In fact it would be better in principle to share the source material, as sharing the results alone prevents us from re-analysing the original data, perhaps with new theories, at a later date. d. > > b > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
