On Fri, Apr 01, 2011 at 03:52:22PM +0300, Ibrahim Harrani wrote:
> Hi,
> 
> I see that people suggest using OSB tokenizer and adding IgnoreHeader
> to avoid useless tokens.
> If OSB is better, why do dspam package use chain tokenizer in default
> dspam.conf? and has a few ignoreHeaders to dspam.conf?
> Before adding IgnoreHeaders, dspam was hitting many and many X headers
> mos of them was DKIM headers which is useless.
> 
> I belive that many new dspam users like me don't know those things and
> use default values in general for safety.
> Also the training is the main part of the dspam. If we can explain
> better this issue, the more people will use dspam.
> Before I tried many time to use dspam but the spam catch rate was very
> low. But now it is very good after your suggestions.
> 
> 
> Thanks.
> 

The defaults were set many years ago when Markov training (OSB, SBPH)
did not even exist. Then at the time that Markov was introduced, the
only database backend that allowed it due to purported performance
problems, was the CSS backend. Since then, Moore's law has held and
now it is very reasonable to use Markov (OSB) on any backend. The
number of additional tokens is not unreasonable and there is plenty
of CPU available to handle the increased work. In addition, fooling
OSB is much, much harder than CHAIN or WORD which means you need
fewer tokens to get the same or better accuracy.

Cheers,
Ken

------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to