On Wed, May 26, 2010 at 07:28:58AM -0700, Bradley Giesbrecht wrote:
> 
> On May 26, 2010, at 7:15 AM, Stevan Baji?? wrote:
> 
> > On Wed, 26 May 2010 06:49:44 -0700, Bradley Giesbrecht
> > <bradley.giesbre...@gmail.com> wrote:
> >>> Most tokens are comming from the X-Greylist header. Would probably
> >>> be not bad to not check that header in DSPAM.
> >>
> >> Thanks for the tip. I have added "IgnoreHeader  X-Greylist" to
> >> dspam.conf. Any other headers I should ignore?
> >>
> > You want my list? I ignore a lot of headers :)
> 
> Ah, I was to obvious :) Yes I would very much like your ignore headers!
> 
> >> When I changed my tokenizer from chain to osb is there still value in
> >> the tokens created with the chain tokenizer?
> >>
> > No. You could delete them. They are useless with OSB.
> 
> Is there an sql query that would identify them. I have added a  
> timestamp column to all my tables so I could remove all tokens before  
> I changed my tokenizer.
> 
No, there is no way to identify which tokens correspond to which
tokenizer. The actual tokens are turned into a 64-bit hash and
then stored in the database. You should be able to use the last_hit
value to identify the old tokens from the new tokens and eventually
delete them.

Depending on the accuracy loss, we may need to run two versions
of DSPAM at the same time and use the results from the chain
tokenizer to train the osb system. Then once the training on the
new system has converged, drop the old tokens.

> BTW, do you use OSB?

We want to, but we are currently using CHAIN.

> 
> And are there relations between signature_data and token_data? Should  
> I remove some of the signature data?
> 
The signature data will age out so you do not really need to worry
about it.

Cheers,
Ken

------------------------------------------------------------------------------

_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to