Kyle,
Thanks, that made a world of difference. Luckily, my co-workers like to
hang on to old email, so I have a reasonably large selection of both
spam and ham. My first pass, just to make sure everything worked, well,
worked. This is what dspam_stats is giving me.
TP: 368 TN: 91 FP: 14 FN: 0 SC: 0 NC: 0
SHR: 100.00% HSR: 13.33% OCA: 97.04%
I just fed dspam_train around 10GB of mail, both spam and ham, so we'll
see what happens.
-Chris
Kyle Johnson wrote:
Hello Chris,
You should be using the dspam_train program for training. What you
were doing, using dspam with --class (though you would also need
--source), is used for retraining (correcting an error).
You need to pass a username, a path to spam, and a path to ham (both
of which are in maildir format), to dspam_train:
dspam_train username /path/to/spam /path/to/ham
You can also use an index file, which tells dspam_train where to find
the spam and ham files.
If you have a mbox, there are a number of programs on the web which
will convert your mail into maildir format.
But remember that you need to train spam and ham. If you train only
one, you will probably mess up accuracy.
Hope this helps,
-Kyle
On Tue, Nov 4, 2008 at 2:02 PM, Chris Baldwin
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
Hi,
I'm having some trouble training DSPAM. I am using an mbox that I
dumped a fair amount of spam into, and then I'm running this command:
formail -s dspam --client --user my.username --class=spam
--source=corpus --mode=teft < Spam.train &
However, when I look at the results, all the spam is tagged as
Innocent:
15347: [11/04/2008 13:56:14] libdspam returned probability of 0.000000
15347: [11/04/2008 13:56:14] message result: NOT SPAM
15347: [11/04/2008 13:56:14] appending header X-DSPAM-Result: Innocent
Am I missing something here?
To make things more confusing form my end, dspam_stats tells me
that every single piece of mail that I've fed dspam is a True
Negative. The problem is that I've also fed it a few hundred ham
messages, using the same syntax as above (w/ --class=innocent),
and getting a similar result as above.
Here's the configure, just so you know how it's set up:
./configure --enable-daemon --enable-syslog
--enable-long-usernames --with-storage-driver=hash_drv
--with-delivery-agent=procmail --enable-verbose-debug
I'd appreciate any ideas or suggestions on what to do at this point.
-Chris Baldwin
!DSPAM:1011,4910b233150921932574660!