So I changed the tokenizer to osb, but left the trainingmode at TEFT. 
Re-ran a big spam corpus and some ham and things started working better. 
I think the main problem was my confusion over the X-DSPAM-Probability 
header. It appears that X-DSPAM-Probability is either 0.00000 (not spam) 
or 1.00000 (spam). So it isn't really a probably but a binary 
spam/notspam. When I kept seeing headers with 0.00000, I thought dspam 
was way off as there should be at least some chance an email was spam. 
In reality, X-DSPAM-Confident is the metric to look at to see how 
'close' dspam was on a false positive / negative, while probability 
doesn't contain any additional information.

Ben

Ben Luey wrote:
> Just wanted to check in again. Every incoming email gets
>
> X-DSPAM-Probability: 0.0000
>
> I'm not a statistician, but this can't be right. I've trained Dspam 
> (3.9.0) on hundredes of spam / not spam from the SA publiccorpus and a 
> spam-free folder. Every time I get a spam message I retrain the filter. 
> But still, even blatant spam gets X-DSPAM-Probability: 0.0000. The 
> X-DSPAM-Confidence varies from 50% to 100% where the lower the 
> confidence, the more likely it is spam.
>
> This can't be normal -- is dspam in some training mode or something? 
> Also, I turned on show factors in my configuration, in case this is 
> helpful blow  are the factors of a 53% confidence, 0% probability 
> blatant spam message I got:
>
> X-Original-To*vescentphotonics.com, 0.00313,    
> Received*vescentphotonics.com>, 0.00447,    Received*2010+13, 
> 0.00479,    Received*2010+13, 0.00479,    
> Received*by+mail.vescentphotonics.com, 0.00533,    
> Received*mail.vescentphotonics.com, 0.00533,    
> Received*mail.vescentphotonics.com+(Postfix), 0.00534,    
> X-Original-To*bugreporter, 0.00691,    Date*2010, 0.00938,    
> Received*for+<bugreporter, 0.00944,    Received*<bugreporter, 
> 0.00944,    Received*2010, 0.00999,    Received*2010, 0.00999,    
> Content-Type*1251", 0.99000,    X-Greylist*45, 0.01000,    DEAR, 
> 0.99000,    X-MimeOLE*MimeOLE+V6.00.2600.0000, 0.99000,    aside, 
> 0.99000,    X-Mailer*Express+6.00.2600.0000, 0.99000,    operation+to, 
> 0.99000,    the+deceased, 0.99000,    consent, 0.99000,    await+your, 
> 0.99000,    set+aside, 0.99000,    Date*2010+13, 0.01000,    
> this+transaction, 0.99000,    this+transaction, 0.99000
>
> Thanks,
>
> Ben
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
> lucky parental unit.  See the prize list and enter to win: 
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> Dspam-user mailing list
> Dspam-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>   


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to