Can someone help me with this?
I continue to be unable to get dspam to correctly classify some messages.
I'm working with one message as an example, I have repeatedly classified it
as spam (--class=spam --source=corpus) into dspam, but dspam continues to
define it as innocent.
My dspam_stats currently shows
TP: 7595 TN: 6121 FP: 1 FN: 133 SC: 64 NC: 0
When I run the message (one of those viagra messages) through dspam, I pipe
it into the command:
dspam --user <user> --stdout --deliver=innocent
and I always get
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8749
X-DSPAM-Probability: 0.0000
----- Original Message -----
From: "Ricardo Kleemann" <[EMAIL PROTECTED]>
To: "David Rees" <[EMAIL PROTECTED]>;
<[email protected]>
Sent: Wednesday, February 07, 2007 12:54 PM
Subject: Re: [dspam-users] false negatives
But I've trained with over 5000 messages on both negative and positive...
I've just ran dspam with --class=spam --source=corpus, and afterwards, I
still get the message as innocent.
----- Original Message -----
From: "David Rees" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, February 07, 2007 12:49 PM
Subject: Re: [dspam-users] false negatives
On 2/7/07, Arnaldo Mandel <[EMAIL PROTECTED]> wrote:
Ricardo Kleemann wrote (on Feb 7, 2007):
> I've installed and configured dspam, and trained it with my own set
of
> messages.
>
> But I still get false negatives when testing dspam with a spam
message that
> was fed as spam into dspam_train. I've also called dspam
with --class=spam,
> feeding it the message, but when I test the message (using
dspam --stdout)
> the message still comes out as innocent.
Have you used the parameter --source=corpus for dspam?
After some feeding it will certainly classify as spam, but in the
process you may bias your token database in an unwanted way.
Yep, and also keep in mind the number of messages you have trained
dspam with. Generally a couple thousand are required before dspam
really starts getting good.
-Dave