Campbell Krueger wrote:
Hey everyone,
Well, my DSPAM implementation has dropped to an accuracy of
approximately 4% over the past few weeks, and training seems to have
absolutely no impact on this. I'm pretty sure this is my own fault,
as for a while I figured that running messages through the dspam_train
tool repeatedly until they were positively identified was the best way
to go (but later started to think about it and realized it probably
polluted my training data). So, moving forward, I have the following
questions:
1) When initially training, is TEFT the best way to go?
Yes, The first 2500 messages are always trained in TEFT mode regardless.
2) Should I initially train using an extremely large collection of
SPAM I already have, as well as all my legit mail?
Yes (in equal proportions)
3) At what point should I switch over to TOE from TEFT?
After the initial training period (Training > 2500)
4) What's the best overall procedure to go about training DSPAM?
Doesn't matter, but do it correctly. (Don't re-feed your corpus over and
over again)
And most importantly...
5) How the heck do I purge all the training data already in place for
my account?
This depends on your Storage Driver, and how many accounts you have.
I would just truncate the token, signature, and stats tables if you only
have a few users, this assumes you are using a SQL based storage driver.
I'd sincerely appreciate any information you can give. Thanks!
Regards,
Campbell Krueger
-Jeff Harris