On Fri, Apr 01, 2011 at 03:52:22PM +0300, Ibrahim Harrani wrote: > Hi, > > I see that people suggest using OSB tokenizer and adding IgnoreHeader > to avoid useless tokens. > If OSB is better, why do dspam package use chain tokenizer in default > dspam.conf? and has a few ignoreHeaders to dspam.conf? > Before adding IgnoreHeaders, dspam was hitting many and many X headers > mos of them was DKIM headers which is useless. > > I belive that many new dspam users like me don't know those things and > use default values in general for safety. > Also the training is the main part of the dspam. If we can explain > better this issue, the more people will use dspam. > Before I tried many time to use dspam but the spam catch rate was very > low. But now it is very good after your suggestions. > > > Thanks. >
The defaults were set many years ago when Markov training (OSB, SBPH) did not even exist. Then at the time that Markov was introduced, the only database backend that allowed it due to purported performance problems, was the CSS backend. Since then, Moore's law has held and now it is very reasonable to use Markov (OSB) on any backend. The number of additional tokens is not unreasonable and there is plenty of CPU available to handle the increased work. In addition, fooling OSB is much, much harder than CHAIN or WORD which means you need fewer tokens to get the same or better accuracy. Cheers, Ken ------------------------------------------------------------------------------ Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user