Campbell Krueger wrote:
Hey everyone,

Well, my DSPAM implementation has dropped to an accuracy of approximately 4% over the past few weeks, and training seems to have absolutely no impact on this. I'm pretty sure this is my own fault, as for a while I figured that running messages through the dspam_train tool repeatedly until they were positively identified was the best way to go (but later started to think about it and realized it probably polluted my training data). So, moving forward, I have the following questions:

1) When initially training, is TEFT the best way to go?
Yes, The first 2500 messages are always trained in TEFT mode regardless.
2) Should I initially train using an extremely large collection of SPAM I already have, as well as all my legit mail?
Yes (in equal proportions)
3) At what point should I switch over to TOE from TEFT?
After the initial training period (Training > 2500)
4) What's the best overall procedure to go about training DSPAM?
Doesn't matter, but do it correctly. (Don't re-feed your corpus over and over again)

And most importantly...

5) How the heck do I purge all the training data already in place for my account?
This depends on your Storage Driver, and how many accounts you have.
I would just truncate the token, signature, and stats tables if you only have a few users, this assumes you are using a SQL based storage driver.

I'd sincerely appreciate any information you can give.  Thanks!

Regards,
Campbell Krueger


-Jeff Harris

Reply via email to