Kyle,

Thanks, that made a world of difference. Luckily, my co-workers like to hang on to old email, so I have a reasonably large selection of both spam and ham. My first pass, just to make sure everything worked, well, worked. This is what dspam_stats is giving me.

   TP:   368 TN:    91 FP:    14 FN:     0 SC:     0 NC:     0
   SHR:  100.00%       HSR:   13.33%       OCA:   97.04%

I just fed dspam_train around 10GB of mail, both spam and ham, so we'll see what happens.


-Chris

Kyle Johnson wrote:
Hello Chris,

You should be using the dspam_train program for training. What you were doing, using dspam with --class (though you would also need --source), is used for retraining (correcting an error).

You need to pass a username, a path to spam, and a path to ham (both of which are in maildir format), to dspam_train:
dspam_train username /path/to/spam /path/to/ham
You can also use an index file, which tells dspam_train where to find the spam and ham files.

If you have a mbox, there are a number of programs on the web which will convert your mail into maildir format.

But remember that you need to train spam and ham. If you train only one, you will probably mess up accuracy.

Hope this helps,
-Kyle

On Tue, Nov 4, 2008 at 2:02 PM, Chris Baldwin <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    Hi,

    I'm having some trouble training DSPAM. I am using an mbox that I
    dumped a fair amount of spam into, and then I'm running this command:

      formail -s dspam --client --user my.username --class=spam
    --source=corpus --mode=teft < Spam.train &

    However, when I look at the results, all the spam is tagged as
    Innocent:

    15347: [11/04/2008 13:56:14] libdspam returned probability of 0.000000
    15347: [11/04/2008 13:56:14] message result: NOT SPAM
    15347: [11/04/2008 13:56:14] appending header X-DSPAM-Result: Innocent

    Am I missing something here?

    To make things more confusing form my end, dspam_stats tells me
    that every single piece of mail that I've fed dspam is a True
    Negative. The problem is that I've also fed it a few hundred ham
    messages, using the same syntax as above (w/ --class=innocent),
    and getting a similar result as above.

    Here's the configure, just so you know how it's set up:
    ./configure --enable-daemon --enable-syslog
    --enable-long-usernames --with-storage-driver=hash_drv
    --with-delivery-agent=procmail --enable-verbose-debug

    I'd appreciate any ideas or suggestions on what to do at this point.

    -Chris Baldwin








!DSPAM:1011,4910b233150921932574660!


Reply via email to