Re: [Dspam-user] Understanding classification: dspam factors?

Stevan Bajić Sat, 23 Apr 2011 03:32:44 -0700

On Fri, 22 Apr 2011 10:17:17 +0200
Tom Hendrikx <t...@whyscream.net> wrote:


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi,
> 
Hello Tom,


> In my current setup I just received my first FP.
>
I hope it is a old setup? One FP is not a big thing.


> Dspam is setup to add
> the dspam-factors header to classified e-mails, but after reviewing the
> data, I don't understand why dspam decided to classify the message as
> spam. Also the X-DSPAM-Improbability header has weird contents.
> 
> Does the dspam_factors header contain all of the tokens used to classify
> the message, or only a subset of them?
>
All of them. But if you use more then one algorithm then only the first one 
will be shown in X-DSPAM-Factors.


> Because the headers in the FP
> message do not explain why it happens:
> 
> X-DSPAM-Result: Spam
> X-DSPAM-Processed: Fri Apr 22 01:01:29 2011
> X-DSPAM-Confidence: 0.9963
> X-DSPAM-Improbability: 1 in 26939 chance of being ham
> X-DSPAM-Probability: 1.0000
> X-DSPAM-Signature: 1,4db0b74991741873512032
> X-DSPAM-Factors: 15,
>       X-AntiAbuse*Original+#+-, 0.99649,
>       X-AntiAbuse*Caller+#+GID, 0.99649,
>       X-AntiAbuse*Sender+#+Domain, 0.99649,
>       X-AntiAbuse*please+#+it, 0.99649,
>       X-AntiAbuse*with+#+#+report, 0.99649,
>       X-AntiAbuse*to+#+abuse, 0.99649,
>       X-AntiAbuse*Primary+#+-, 0.99649,
>       X-AntiAbuse*Original+Domain, 0.99649,
>       X-AntiAbuse*GID+-, 0.99649,
>       X-AntiAbuse*Sender+#+#+-, 0.99649,
>       X-AntiAbuse*track+abuse, 0.99649,
>       X-AntiAbuse*header+was, 0.99649,
>       X-AntiAbuse*header+#+#+#+track, 0.99649,
>       X-AntiAbuse*was+#+to, 0.99649,
>       X-AntiAbuse*Originator+Caller, 0.99649
> 
> According to the scoring of the listed tokens, I think this message
> should be marked as ham, not as spam.
>
I think you mix up things here. If the result is "Spam" then the shown tokens 
are spam tokens. See this old CHANGELOG entry:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
[20040819.0800] jonz: added X-DSPAM-Factors

added determining factors header to emails containing a list of tokens that
played a role in the decision. if multiple algorithms are defined, only one
is used. if the message is spam, the factor set from an algorithm returning
a spam result will be used.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


> Relevant values from dspam.conf:
> 
> TrainingMode teft
> ImprobabilityDrive on
> Algorithm graham burton
>
The 15 factors you see in your mail are the one from Graham. Burton would 
produce 27.


> Tokenizer osb
> PValue bcr
> 
> All of the above with a git tip checkout from 2011-03-01.
> 
> Kind regards,
> 
>       Tom
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić


> FWIW: I added the X-AntiAbuse header to the Ignmoreheaders after
> reviewing this message, because I concluded that the header is pretty
> useless for classification.
> 
Yes. Especially in your case where you use TEFT. Would you use something like 
TOE then things would be different. But with TEFT you are pretty much weakening 
your data with headers that have a static value and appear to often in regular 
messages.

> 
> - -- 
> New PGP key: 7D54EFF5
> Fingerprint: C26F 374F 5E13 157B 5B42  7A1B 93DF 319D 7D54 EFF5
> http://www.whyscream.net/key-transition-2011-03-30.txt.asc
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iQIcBAEBAgAGBQJNsTmNAAoJEJPfMZ19VO/1GPkP/RRPmcjm+GodpcVhTQH2HzX2
> nVJlZKpVedc6O+NHd79++wFD6xQ4O/+58r4KmV3w1IuVp+VJ105sAiaslnYZDNzq
> i4/6gZgUZtb2UOTyQCFsJekiXWjsPc2mTLvHFDuDtHEPNlKB2XKexfSP1wAiq3Xx
> DE/Uxp9OjrmVa3pB9632l+YOOmzno/x6P975hr34ToULBlm2Vsqq0Z7x8OjZfMD3
> 78MlKo5YiY9yNnJoY8OZPj8MXu5EtRRHcotkc3vZ4QfofCLIKFWzC8YXQ9arzhJy
> HEdSdcHR7s91z+/tSfiDfXy3cSff7Qwanvi7HBm4+zWT9+EAX2Y3nGvb097ymmhz
> 3lLPYlgDWDfxXIkmScGINHyXrTr91tp7YgsnrV8/GbVoW2HLoa83cS/im/GfkDoZ
> Kmy0OmFc65Apv8S4kl5FYdA4bWemIHlcLaLZjX2zNVm3JYzg5Eatb8N63j//4nO7
> 9fAZjpY5/j9oLTs60L/uPwhqgqFZWJebCf1rQcPDMSAjzO9kBrXG0v4bT/dbAd5E
> KXuoVhxY1VsIh+agc+92dsufdeVO344hZpUtPqwWsfhb6/OvI9gyRuSiqyAznZD3
> 5KPGuO05yVmwvrBAdNiTah3uHsLh5UAf3Dk12TE3LKQfx443Fh5gZg1P9XWj5xfO
> kE3slZqPktWcL6EKfZPS
> =hra9
> -----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] Understanding classification: dspam factors?

Reply via email to