On Wed, 26 May 2010 09:23:06 -0500, Kenneth Marshall <k...@rice.edu> wrote: > On Wed, May 26, 2010 at 04:15:25PM +0200, Stevan Baji?? wrote: >> On Wed, 26 May 2010 06:49:44 -0700, Bradley Giesbrecht >> <bradley.giesbre...@gmail.com> wrote: >> >> Most tokens are comming from the X-Greylist header. Would probably >> >> be not bad to not check that header in DSPAM. >> > >> > Thanks for the tip. I have added "IgnoreHeader X-Greylist" to >> > dspam.conf. Any other headers I should ignore? >> > >> You want my list? I ignore a lot of headers :) >> >> >> > When I changed my tokenizer from chain to osb is there still value in
>> > the tokens created with the chain tokenizer? >> > >> No. You could delete them. They are useless with OSB. >> > Wouldn't the biGrams produced by the chain tokenizer be able to > be used by the osb tokenizer, albeit with reduced accuracy? > No and yes. OSB normally produces the following output on a window of 5 words: <word4>+<word5> <word3>+<skip>+<word5> <word2>+<skip>+<skip>+<word5> <word1>+<skip>+<skip>+<skip>+<word5> So the token <word4>+<word5> would be something that you would find in the chain tokenizer. That is not much. And since OSB is learning so fast it is almost pointless in trying to use chain tokens. I would say that you would get better results when deleting the chain tokens and start from scratch. > I am > trying to figure out how bad the effect of changing the tokenizer > would be because we would like to move to osb, but if the accuracy > is reduced we may need to only add new users with osb. > You can not mix tokenizers on one DSPAM instance. Either OSB or CHAIN but you can not say user A has OSB while user B has CHAIN (if user A and user B are on the same DSPAM instance). > Regards, > Ken -- Kind Regards from Switzerland, Stevan Bajić ------------------------------------------------------------------------------ _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user