Bill,

This is a pretty god point, but it diverges from the issue as far as Chris Means' 
mailet is concerned because a shared corpus is actually a very good starting point for 
training a system , and more effective than starting from scratch.

It is pretty easy to alter the behaviour by forwarding good or bad mail to the 
appropriate addresses.

I carried out a survey of our users who have been training a shared corpus affecting 
(tagging not filtering) mailing lists and individual accounts, they were very pleased 
with overall performance and voted to continue to use the system. They were less happy 
with the effort involved in training, but accepted it as it was obviously 
(subjectively) effective in altering the behaviour of the tagger.

> if you are interested we can discuss this in more detail 
> off-list,

No, discuss it here, so we can all hear it.

> but my experience is that cooperative work on 
> determining what terms. phrases, patterns, etc. are used to catch 
> specific material are generally more useful than the sharing of 
> mail that has been identified by cooperative efforts as spam. 

I believe that this is probably true, but as the Bayesian system can re-create its 
"patterns" from a collection of mail there is little real difference between sharing 
the token probabilities and the source material. In fact it would be better in 
principle to share the source material, as sharing the results alone prevents us from 
re-analysing the original data, perhaps with new theories, at a later date.

d.

> 
> b
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to