Re: [dspam-users] Trouble training DSPAM

Chris Baldwin Tue, 04 Nov 2008 12:36:20 -0800

Kyle,

Thanks, that made a world of difference. Luckily, my co-workers like tohang on to old email, so I have a reasonably large selection of bothspam and ham. My first pass, just to make sure everything worked, well,worked. This is what dspam_stats is giving me.


   TP:   368 TN:    91 FP:    14 FN:     0 SC:     0 NC:     0
   SHR:  100.00%       HSR:   13.33%       OCA:   97.04%

I just fed dspam_train around 10GB of mail, both spam and ham, so we'llsee what happens.



-Chris

Kyle Johnson wrote:

Hello Chris,
You should be using the dspam_train program for training. What youwere doing, using dspam with --class (though you would also need--source), is used for retraining (correcting an error).
You need to pass a username, a path to spam, and a path to ham (bothof which are in maildir format), to dspam_train:
dspam_train username /path/to/spam /path/to/ham
You can also use an index file, which tells dspam_train where to findthe spam and ham files.
If you have a mbox, there are a number of programs on the web whichwill convert your mail into maildir format.
But remember that you need to train spam and ham. If you train onlyone, you will probably mess up accuracy.
Hope this helps,
-Kyle
On Tue, Nov 4, 2008 at 2:02 PM, Chris Baldwin<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
    Hi,

    I'm having some trouble training DSPAM. I am using an mbox that I
    dumped a fair amount of spam into, and then I'm running this command:

      formail -s dspam --client --user my.username --class=spam
    --source=corpus --mode=teft < Spam.train &

    However, when I look at the results, all the spam is tagged as
    Innocent:

    15347: [11/04/2008 13:56:14] libdspam returned probability of 0.000000
    15347: [11/04/2008 13:56:14] message result: NOT SPAM
    15347: [11/04/2008 13:56:14] appending header X-DSPAM-Result: Innocent

    Am I missing something here?

    To make things more confusing form my end, dspam_stats tells me
    that every single piece of mail that I've fed dspam is a True
    Negative. The problem is that I've also fed it a few hundred ham
    messages, using the same syntax as above (w/ --class=innocent),
    and getting a similar result as above.

    Here's the configure, just so you know how it's set up:
    ./configure --enable-daemon --enable-syslog
    --enable-long-usernames --with-storage-driver=hash_drv
    --with-delivery-agent=procmail --enable-verbose-debug

    I'd appreciate any ideas or suggestions on what to do at this point.

    -Chris Baldwin


!DSPAM:1011,4910b233150921932574660!

Re: [dspam-users] Trouble training DSPAM

Reply via email to