Re: [dspam-users] multiple use?

Tony Earnshaw Mon, 15 Jan 2007 08:18:42 -0800

Tom Allison wrote:

[...]

If that was the case then why would I consider the wheel since someonemight have stumbled on that one too...
I was working on a few assumptions:
a token is a representation of essentially a regex match in either case,CRM114 or Bayes.

"a token is a representation of essentially a regex match ...": utterspout. Did you ever study statistics? I did, as part of my businesseconomy course. It was the only branch of math that ever captured myimagination and made me want to do more.

Any overlap is purely coincidental.


What overlap?

How you manipulate the tokens, based on history, is dependent upon themethod of calculation, markov/chi-square/naive, but they are dependenton the same base history of good/bad messages and good/bad tokens.
So a signature can consist of both naive derived tokens and SPBH derivedtokens.Any learning or correction of that token will be to apply a correctionto the historical count (+1/-1) in either case. So the data and it'shistory remains consistent.
The more variations you can deploy in checking for spam the better thechances that something will get trapped.

I'm happy with the proved 99.26% accuracy after 91,000+ messages with0.45% false positives with which dspam is bountifully benefiting my highschool site (and without much participation on my part) or that of myprovenly mostly idle, ignorant and stupid users. That after a couple ofyears' shooting around with SpamAssassin and constantly using hours ontwiddling hundreds of knobs to get half of the accuracy (98%). And nothaving the chance to give my users (see above) their democratic right tocorrect mistakes.


How would your users correct their mistakes with your mixture?

The biggest advantage that dspam can provide is a lighter weight naiveor chi-square determination, removing some of the more obvious spam viaquarantine, followed by the slower CRM114 methodology to furtherdetermine what's left over from the bayes determination.

It probably won't work because there just isn't enough data capturedabout the tokens.

As I wrote, I'm satisfied with 99.26% accuracy after 91,000+ messagesetc. My site's Postfix 2.3 server is refusing (empirically) well over98% of all potential spam, with around 0,1% of false positives before itever gets to dspam. Try concentrating on that.


> But if it was truely a bad idea then why do so many
> people use multiple filters to capture spam?

Do they? Is recycling the same message base repeatedly through the samebadly configured filter using "multiple filters"? If you want to usemultiple filters, then use multiple filters.


--Tonni

--
Tony Earnshaw
Email: tonni at hetnet.nl

Re: [dspam-users] multiple use?

Reply via email to