Hi Tom,

Which dspam version you are using? How do you train? Which tokenizer
do you use during the train and after train?
Dspam is very sensitive about training. If you don't train very well
or if you train too much you may have troubles.
Also there are many headers you should ignore. You can get the list from:
http://sourceforge.net/apps/mediawiki/dspam/index.php?title=Working_DSPAM%2BPOSTFIX%2BMYSQL%2BCLAMAV_Setup_by_PaulC

Also if uploaded spam/ham corpus from windows to unix/linux you should
ignore them by adding the following line to dspam.conf.
I had this problem before, In this case dspam was only checking the
headers like for the classification.

#Specifying 'lineStripping' causes DSPAM to strip ^M's from messages
passed # in.
Broken lineStripping

If you have same problem you may have to re-train your dspam data.

Thanks.

On Fri, Apr 22, 2011 at 9:17 AM, Tom Hendrikx <t...@whyscream.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> In my current setup I just received my first FP. Dspam is setup to add
> the dspam-factors header to classified e-mails, but after reviewing the
> data, I don't understand why dspam decided to classify the message as
> spam. Also the X-DSPAM-Improbability header has weird contents.
>
> Does the dspam_factors header contain all of the tokens used to classify
> the message, or only a subset of them? Because the headers in the FP
> message do not explain why it happens:
>
> X-DSPAM-Result: Spam
> X-DSPAM-Processed: Fri Apr 22 01:01:29 2011
> X-DSPAM-Confidence: 0.9963
> X-DSPAM-Improbability: 1 in 26939 chance of being ham
> X-DSPAM-Probability: 1.0000
> X-DSPAM-Signature: 1,4db0b74991741873512032
> X-DSPAM-Factors: 15,
>        X-AntiAbuse*Original+#+-, 0.99649,
>        X-AntiAbuse*Caller+#+GID, 0.99649,
>        X-AntiAbuse*Sender+#+Domain, 0.99649,
>        X-AntiAbuse*please+#+it, 0.99649,
>        X-AntiAbuse*with+#+#+report, 0.99649,
>        X-AntiAbuse*to+#+abuse, 0.99649,
>        X-AntiAbuse*Primary+#+-, 0.99649,
>        X-AntiAbuse*Original+Domain, 0.99649,
>        X-AntiAbuse*GID+-, 0.99649,
>        X-AntiAbuse*Sender+#+#+-, 0.99649,
>        X-AntiAbuse*track+abuse, 0.99649,
>        X-AntiAbuse*header+was, 0.99649,
>        X-AntiAbuse*header+#+#+#+track, 0.99649,
>        X-AntiAbuse*was+#+to, 0.99649,
>        X-AntiAbuse*Originator+Caller, 0.99649
>
> According to the scoring of the listed tokens, I think this message
> should be marked as ham, not as spam. Relevant values from dspam.conf:
>
> TrainingMode teft
> ImprobabilityDrive on
> Algorithm graham burton
> Tokenizer osb
> PValue bcr
>
> All of the above with a git tip checkout from 2011-03-01.
>
> Kind regards,
>
>        Tom
>
>
> FWIW: I added the X-AntiAbuse header to the Ignmoreheaders after
> reviewing this message, because I concluded that the header is pretty
> useless for classification.
>
>
> - --
> New PGP key: 7D54EFF5
> Fingerprint: C26F 374F 5E13 157B 5B42  7A1B 93DF 319D 7D54 EFF5
> http://www.whyscream.net/key-transition-2011-03-30.txt.asc
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQIcBAEBAgAGBQJNsTmNAAoJEJPfMZ19VO/1GPkP/RRPmcjm+GodpcVhTQH2HzX2
> nVJlZKpVedc6O+NHd79++wFD6xQ4O/+58r4KmV3w1IuVp+VJ105sAiaslnYZDNzq
> i4/6gZgUZtb2UOTyQCFsJekiXWjsPc2mTLvHFDuDtHEPNlKB2XKexfSP1wAiq3Xx
> DE/Uxp9OjrmVa3pB9632l+YOOmzno/x6P975hr34ToULBlm2Vsqq0Z7x8OjZfMD3
> 78MlKo5YiY9yNnJoY8OZPj8MXu5EtRRHcotkc3vZ4QfofCLIKFWzC8YXQ9arzhJy
> HEdSdcHR7s91z+/tSfiDfXy3cSff7Qwanvi7HBm4+zWT9+EAX2Y3nGvb097ymmhz
> 3lLPYlgDWDfxXIkmScGINHyXrTr91tp7YgsnrV8/GbVoW2HLoa83cS/im/GfkDoZ
> Kmy0OmFc65Apv8S4kl5FYdA4bWemIHlcLaLZjX2zNVm3JYzg5Eatb8N63j//4nO7
> 9fAZjpY5/j9oLTs60L/uPwhqgqFZWJebCf1rQcPDMSAjzO9kBrXG0v4bT/dbAd5E
> KXuoVhxY1VsIh+agc+92dsufdeVO344hZpUtPqwWsfhb6/OvI9gyRuSiqyAznZD3
> 5KPGuO05yVmwvrBAdNiTah3uHsLh5UAf3Dk12TE3LKQfx443Fh5gZg1P9XWj5xfO
> kE3slZqPktWcL6EKfZPS
> =hra9
> -----END PGP SIGNATURE-----
>
> ------------------------------------------------------------------------------
> Fulfilling the Lean Software Promise
> Lean software platforms are now widely adopted and the benefits have been
> demonstrated beyond question. Learn why your peers are replacing JEE
> containers with lightweight application servers - and what you can gain
> from the move. http://p.sf.net/sfu/vmware-sfemails
> _______________________________________________
> Dspam-user mailing list
> Dspam-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>

------------------------------------------------------------------------------
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to