On Wed, May 26, 2010 at 07:28:58AM -0700, Bradley Giesbrecht wrote: > > On May 26, 2010, at 7:15 AM, Stevan Baji?? wrote: > > > On Wed, 26 May 2010 06:49:44 -0700, Bradley Giesbrecht > > <bradley.giesbre...@gmail.com> wrote: > >>> Most tokens are comming from the X-Greylist header. Would probably > >>> be not bad to not check that header in DSPAM. > >> > >> Thanks for the tip. I have added "IgnoreHeader X-Greylist" to > >> dspam.conf. Any other headers I should ignore? > >> > > You want my list? I ignore a lot of headers :) > > Ah, I was to obvious :) Yes I would very much like your ignore headers! > > >> When I changed my tokenizer from chain to osb is there still value in > >> the tokens created with the chain tokenizer? > >> > > No. You could delete them. They are useless with OSB. > > Is there an sql query that would identify them. I have added a > timestamp column to all my tables so I could remove all tokens before > I changed my tokenizer. > No, there is no way to identify which tokens correspond to which tokenizer. The actual tokens are turned into a 64-bit hash and then stored in the database. You should be able to use the last_hit value to identify the old tokens from the new tokens and eventually delete them.
Depending on the accuracy loss, we may need to run two versions of DSPAM at the same time and use the results from the chain tokenizer to train the osb system. Then once the training on the new system has converged, drop the old tokens. > BTW, do you use OSB? We want to, but we are currently using CHAIN. > > And are there relations between signature_data and token_data? Should > I remove some of the signature data? > The signature data will age out so you do not really need to worry about it. Cheers, Ken ------------------------------------------------------------------------------ _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user