Re: [Dspam-user] Understanding classification: dspam factors?

Tom Hendrikx Tue, 26 Apr 2011 00:52:36 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 23/04/11 12:24, Stevan Bajić wrote:
> On Fri, 22 Apr 2011 10:17:17 +0200
> Tom Hendrikx <t...@whyscream.net> wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi,
>>
> Hello Tom,
> 
> 
>> In my current setup I just received my first FP.
>>
> I hope it is a old setup? One FP is not a big thing.
> 
>> Dspam is setup to add
>> the dspam-factors header to classified e-mails, but after reviewing the
>> data, I don't understand why dspam decided to classify the message as
>> spam. Also the X-DSPAM-Improbability header has weird contents.
>>
>> Does the dspam_factors header contain all of the tokens used to classify
>> the message, or only a subset of them?
>>
> All of them. But if you use more then one algorithm then only the first one 
> will be shown in X-DSPAM-Factors.
>


Ah I see. I use "graham burton" now. Would there be any change in
classification if I changed that to "burton graham" in order to see more
factors?

> 
>> Because the headers in the FP
>> message do not explain why it happens:
>>
>> X-DSPAM-Result: Spam
>> X-DSPAM-Processed: Fri Apr 22 01:01:29 2011
>> X-DSPAM-Confidence: 0.9963
>> X-DSPAM-Improbability: 1 in 26939 chance of being ham
>> X-DSPAM-Probability: 1.0000
>> X-DSPAM-Signature: 1,4db0b74991741873512032
>> X-DSPAM-Factors: 15,
>>      X-AntiAbuse*Original+#+-, 0.99649,
>>      X-AntiAbuse*Caller+#+GID, 0.99649,
>>      X-AntiAbuse*Sender+#+Domain, 0.99649,
>>      X-AntiAbuse*please+#+it, 0.99649,
>>      X-AntiAbuse*with+#+#+report, 0.99649,
>>      X-AntiAbuse*to+#+abuse, 0.99649,
>>      X-AntiAbuse*Primary+#+-, 0.99649,
>>      X-AntiAbuse*Original+Domain, 0.99649,
>>      X-AntiAbuse*GID+-, 0.99649,
>>      X-AntiAbuse*Sender+#+#+-, 0.99649,
>>      X-AntiAbuse*track+abuse, 0.99649,
>>      X-AntiAbuse*header+was, 0.99649,
>>      X-AntiAbuse*header+#+#+#+track, 0.99649,
>>      X-AntiAbuse*was+#+to, 0.99649,
>>      X-AntiAbuse*Originator+Caller, 0.99649
>>
>> According to the scoring of the listed tokens, I think this message
>> should be marked as ham, not as spam.
>>
> I think you mix up things here.

First thing I mixed up was that I was under the impression that a high
'score' in the token meant a 'low spamminess'. My bad, as it's the other
way around.

> If the result is "Spam" then the shown tokens are spam tokens. See this old 
> CHANGELOG entry:
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> [20040819.0800] jonz: added X-DSPAM-Factors
> 
> added determining factors header to emails containing a list of tokens that
> played a role in the decision. if multiple algorithms are defined, only one
> is used. if the message is spam, the factor set from an algorithm returning
> a spam result will be used.
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> 

In debug output, there are many more tokens generated than 15 or 27 (I
count 1350 tokens in my test). How are the 15 or 27 tokens that are used
for classification selected from the larger set? Most spammy (for
spam-classified message) or innocent (for innocent-classified message)
tokens?

Hmm maybe I should read more about these algorithms... :)


> 
>> Relevant values from dspam.conf:
>>
>> TrainingMode teft
>> ImprobabilityDrive on
>> Algorithm graham burton
>>
> The 15 factors you see in your mail are the one from Graham. Burton would 
> produce 27.
> 
> 
>> Tokenizer osb
>> PValue bcr
>>
>> All of the above with a git tip checkout from 2011-03-01.
>>
>> Kind regards,
>>
>>      Tom
>>


- -- 

New PGP key: 7D54EFF5
Fingerprint: C26F 374F 5E13 157B 5B42  7A1B 93DF 319D 7D54 EFF5
http://www.whyscream.net/key-transition-2011-03-30.txt.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJNtnZTAAoJEJPfMZ19VO/1wgcQAJyEjpd6EPapEQFWpLSxijPT
ibn3LyYfotY3dTLbWqgRzcI5+WkIXau8xjb63ZnfLckW/pzoKsXGbTMQcoik7N+Z
qc17jvjxRIX0m4bA7gO5yZaywADpO08YmsO+oUTO1Juv+rVXxJaokQrQKgYEa6Ui
ZQC/pAJora0vL5flfaOPZZ6fkb/J60VHQBCSRRUzM+b3MEdoQnbnBXLi1VlJDs04
hTyUI4LT7xFaEGS8KrYBYzRp/ioQ88VJwpCU9WFcndLjpwBqtbVQujRQxLFROy/z
C8kxyTBVbhmcs538D0AFobRMqU7vmviflYEfdbIsI4r0nqxxI+ww8z61D37axylD
QkaCNAX+mGI4b8QO6451pkHq0lM27YKRWkGAh0LkB+8wQ4VlC0W84Ygt1/BiFNMp
kQhcwVAdjQsd2KYNdH3PPCPbOKh5D3o2psvKA2N7EwXxRO48O6Y9fzeEPL9qYLBf
hiOQn5stBo1I8KQs17XhaeRWvJBFd8xPlcGaw+qaimJJFbCeQRjaxC34xb9GF805
k0+R+irexgesaYFQZ1fgKjRTrcVWJZx8+C9HnlNu4R2u+93NtE5of4EJGTutPZvy
ouTBS8X7pkk6TZHDc7rR+j8KOOc/nkObKnF1Li18F32dZG3g2DbSiT0+Gnfek156
KMNmLBVC+tSGbh/CFEkC
=ymjm
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] Understanding classification: dspam factors?

Reply via email to