On Sun, 15 Feb 2004, Stroller uttered the following immortal words,

> 
> On Feb 15, 2004, at 10:27 pm, Ralph Slooten wrote:
> 
> > On Sun, 15 Feb 2004 20:09:58 +0000
> > Stroller <[EMAIL PROTECTED]> wrote:
> >
> >> "only caught 92% of spam, with 1.16% false positives".
> >
> > So far I've been running it  (bmf) for a week....
> 
> Note that the figure of 92% did *NOT* refer to bmf or Bogofilter, but 
> to a filter written only for some tests done c 1998.

Yes we typically expect 99% from bayesian filters.
 
> > For my "good" mail I fed it with most of my my friends ...
> > and from all the mailing lists I belong to from this 2
> > weeks +-(can be downloaded from their monthly archives if need be).
> 
> For me, this is would be unrepresentative, because all my messages I 
> receive from mailing lists are filtered into folders based on headers 
> such as "List-Post:" - there's no advantage in statistically 
> classifying them.

Also I find that bayesian filters have a problem when you feed them list
mail, somehow mailing lists look suspiciously like spam to them (because
they contain a lot of uncessary data?), so if you feed them a large amount
of mailing list data tagged as non spam, then I find the spam detection
rate drops a lot.

> > For each mail caught as spam, the database automatically updates itself
> > with any new contents of that mail, making it learn as it's catching
> > mails.
> 
> I regard this as risky business - if you fail to reclassify any 
> mistakes, then the filter will be more likely to make errors in future. 
> You do state that bmf allows you to reclassify mistakes, but IMO it's 
> better only to add spam/ham messages to the token database when the 
> user specifically requests them.

Actually I thought that a bayesian filter is supposed to do exactly the 
above, ie it learns and autoupdates its database as as time goes by it 
gets better and better.

Grendel

-- 
Grendels annoyance filter is so advanced it puts people to the killfile 
even before they have posted. 

--
[EMAIL PROTECTED] mailing list

Reply via email to