On Jun 25, 2010, at 8:20 AM, Ben Luey wrote:

> So I changed the tokenizer to osb, but left the trainingmode at TEFT.
> Re-ran a big spam corpus and some ham and things started working  
> better.
> I think the main problem was my confusion over the X-DSPAM-Probability
> header. It appears that X-DSPAM-Probability is either 0.00000 (not  
> spam)
> or 1.00000 (spam). So it isn't really a probably but a binary
> spam/notspam.

I'm to new with dspam to offer suggestions but I can state this  
assumption is wrong.

Here are my dspam headers from a sample spam email.

##
X-Dspam-Result:         Spam
X-Dspam-Processed:      Thu Jun 24 22:24:30 2010
X-Dspam-Confidence:     0.4884
X-Dspam-Improbability:  1 in 96 chance of being ham
X-Dspam-Probability:    0.9113
X-Dspam-Signature:      11,4c243d8e1239212984381
X-Dspam-Factors:        15, Received*Thu+#+#+#+22, 0.01000, Received*triband 
+mum, 0.84064, Received*triband+#+#+(triband, 0.84064, Received*from+# 
+#+#+(triband, 0.84064, Received*mum+#+(triband, 0.84064, Received*from 
+triband, 0.84064, Received*triband+#+#+#+mum, 0.84064, Received*with 
+SMTP, 0.16312, Received*SMTP+id, 0.17381, Received*(triband+mum,  
0.82486, Received*(Postfix)+#+SMTP, 0.19235, Received*pixilla.com> 
+Thu, 0.20739, Received*mum+#+#+mum, 0.79026, Received*from+#+mum,  
0.79026, Received*by+#+#+#+SMTP, 0.21654
##

And here is a sample ham email. All of my ham emails I have observed  
have "X-Dspam-Probability:      0.0000".

##
X-Dspam-Result:         Innocent
X-Dspam-Processed:      Wed Jun 23 14:02:29 2010
X-Dspam-Confidence:     0.5156
X-Dspam-Improbability:  1 in 107 chance of being spam
X-Dspam-Probability:    0.0000
X-Dspam-Signature:      11,4c2276651232104920670
X-Dspam-Factors:        27, Received*for+#+#+#+23, 0.99000, Received*for+#+# 
+#+23, 0.99000, Received*Wed+23, 0.99000, Received*Wed+23, 0.99000,  
https+#+https, 0.01000, https+#+https, 0.01000, so+#+#+are, 0.01000,  
Date*Wed+23, 0.99000, Content-Type*multipart/alternative+#+#+1,  
0.01000, Date*23+Jun, 0.99000, Received*23+Jun, 0.99000,  
Received*23+Jun, 0.99000, 20+https, 0.01000, CLIENTS+WITH, 0.01000,  
popular+#+#+#+the, 0.01000, we+are, 0.01000, Date*54+0500, 0.01000,  
20+#+#+https, 0.01000, were+so, 0.01000, //+#+//, 0.01000, //+#+//,  
0.01000, Date*23+#+2010, 0.99000, Received*23+#+2010, 0.99000,  
Received*23+#+2010, 0.99000, so+#+we, 0.01000, Phone+#+#+#+Fax,  
0.01000, Phone+#+#+#+Fax, 0.01000
##

> When I kept seeing headers with 0.00000, I thought dspam
> was way off as there should be at least some chance an email was spam.
> In reality, X-DSPAM-Confident is the metric to look at to see how
> 'close' dspam was on a false positive / negative, while probability
> doesn't contain any additional information.
>
> Ben
>
> Ben Luey wrote:
>> Just wanted to check in again. Every incoming email gets
>>
>> X-DSPAM-Probability: 0.0000
>>
>> I'm not a statistician, but this can't be right. I've trained Dspam
>> (3.9.0) on hundredes of spam / not spam from the SA publiccorpus  
>> and a
>> spam-free folder. Every time I get a spam message I retrain the  
>> filter.
>> But still, even blatant spam gets X-DSPAM-Probability: 0.0000. The
>> X-DSPAM-Confidence varies from 50% to 100% where the lower the
>> confidence, the more likely it is spam.
>>
>> This can't be normal -- is dspam in some training mode or something?
>> Also, I turned on show factors in my configuration, in case this is
>> helpful blow  are the factors of a 53% confidence, 0% probability
>> blatant spam message I got:
>>
>> X-Original-To*vescentphotonics.com, 0.00313,
>> Received*vescentphotonics.com>, 0.00447,    Received*2010+13,
>> 0.00479,    Received*2010+13, 0.00479,
>> Received*by+mail.vescentphotonics.com, 0.00533,
>> Received*mail.vescentphotonics.com, 0.00533,
>> Received*mail.vescentphotonics.com+(Postfix), 0.00534,
>> X-Original-To*bugreporter, 0.00691,    Date*2010, 0.00938,
>> Received*for+<bugreporter, 0.00944,    Received*<bugreporter,
>> 0.00944,    Received*2010, 0.00999,    Received*2010, 0.00999,
>> Content-Type*1251", 0.99000,    X-Greylist*45, 0.01000,    DEAR,
>> 0.99000,    X-MimeOLE*MimeOLE+V6.00.2600.0000, 0.99000,    aside,
>> 0.99000,    X-Mailer*Express+6.00.2600.0000, 0.99000,    operation 
>> +to,
>> 0.99000,    the+deceased, 0.99000,    consent, 0.99000,    await 
>> +your,
>> 0.99000,    set+aside, 0.99000,    Date*2010+13, 0.01000,
>> this+transaction, 0.99000,    this+transaction, 0.99000
>>
>> Thanks,
>>
>> Ben
>>
>> ------------------------------------------------------------------------------
>> ThinkGeek and WIRED's GeekDad team up for the Ultimate
>> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
>> lucky parental unit.  See the prize list and enter to win:
>> http://p.sf.net/sfu/thinkgeek-promo
>> _______________________________________________
>> Dspam-user mailing list
>> Dspam-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dspam-user
>>
>
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit.  See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> Dspam-user mailing list
> Dspam-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspam-user


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to