Re: Serious Proposal to add AccuTechnology(tm) to SpamAssassin (SA)

Shelby Moore 4 Mar 2005 00:49:40 -0000

Brook Humphrey wrote:
> Shelby Moore wrote:
>> SpamAssassin may find eventually it needs to have a global Bayesian
>> database to remain competitive (in terms of false negative and false
>> positive error rates) with systems, such as Death2Spam, etc..
>>
>> BTW, I hear many anecdotal reports of 99% FNR with SpamAssassin (usually
>> they are accompanied with 0% claimed FPR), but real world tests (even using
>> SpamAssassin's corpus) show it is roughly the same as single-user Bayesian
>> systems.  Thus how much you train and fiddle with it are crucial.
>>
>> Whereas, systems such as Death2Spam and AccuTechnology which leverage
>> multi-users in a centralized database are pointing towards much higher
>> performance without increased per user training.  In other words, this is
>> the future of the enterprise anti-spam IMO.  The best anti-spam on the
>> NWFusion study are all large systems that correlate 10000s of users.
>
>Although not particularly on this level spamassassin already includes the 
>ability to use a sitewide bayes. Some of us set it up that way be default 
>every single time we use it on every system we do. To do it any other way is 
>just inefficient. So basically what you provide is an offsite bayes db for 
>everybody to tie into.



Yes I heard of that from a sys admin who uses SpamAssassin quite successfully, 
who has been advising me on it.  One point he continually makes to me is that 
marginal (e.g. going from 9x% to 99%) performance of SA is very much correlated 
to the effort of the sys admin to configure and train it.

My focus has been on comparing systems when they are 100% auto-trained.  This 
data is very hard to get, because no one does that (yet!).  My best guess 
(based on study at TrecSpam) is that SA-standard auto-trained is in range of 
93-95% (5-7% fnr) and that AccuTechnology is similar, but with only 230 users 
and only 2 months in operation, and I see anecdotes already (my business email 
account) of AccuTechnology climbing to 99+% when it has enough spam to sample.

I have seen no single-user auto-trained filter get any where near 99% for many 
users.  AccuTechnology appears to do that.

Other approaches are claimed to get 99.5% for many users where training is 
shared (combo of per-user and global DB):

http://death2spam.com/docs/classifier.html

Re: Serious Proposal to add AccuTechnology(tm) to SpamAssassin (SA)

Reply via email to