Brook Humphrey wrote: > Shelby Moore wrote: >> SpamAssassin may find eventually it needs to have a global Bayesian >> database to remain competitive (in terms of false negative and false >> positive error rates) with systems, such as Death2Spam, etc.. >> >> BTW, I hear many anecdotal reports of 99% FNR with SpamAssassin (usually >> they are accompanied with 0% claimed FPR), but real world tests (even using >> SpamAssassin's corpus) show it is roughly the same as single-user Bayesian >> systems. Thus how much you train and fiddle with it are crucial. >> >> Whereas, systems such as Death2Spam and AccuTechnology which leverage >> multi-users in a centralized database are pointing towards much higher >> performance without increased per user training. In other words, this is >> the future of the enterprise anti-spam IMO. The best anti-spam on the >> NWFusion study are all large systems that correlate 10000s of users. > >Although not particularly on this level spamassassin already includes the >ability to use a sitewide bayes. Some of us set it up that way be default >every single time we use it on every system we do. To do it any other way is >just inefficient. So basically what you provide is an offsite bayes db for >everybody to tie into.
Yes I heard of that from a sys admin who uses SpamAssassin quite successfully, who has been advising me on it. One point he continually makes to me is that marginal (e.g. going from 9x% to 99%) performance of SA is very much correlated to the effort of the sys admin to configure and train it. My focus has been on comparing systems when they are 100% auto-trained. This data is very hard to get, because no one does that (yet!). My best guess (based on study at TrecSpam) is that SA-standard auto-trained is in range of 93-95% (5-7% fnr) and that AccuTechnology is similar, but with only 230 users and only 2 months in operation, and I see anecdotes already (my business email account) of AccuTechnology climbing to 99+% when it has enough spam to sample. I have seen no single-user auto-trained filter get any where near 99% for many users. AccuTechnology appears to do that. Other approaches are claimed to get 99.5% for many users where training is shared (combo of per-user and global DB): http://death2spam.com/docs/classifier.html
