-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi,
Some of the answers to your questions were already in my e-mail: - - version: git tip from 2011-03-01, commit f02393585adca32778a176cfdf57e3bdef7b9496 according to git log - - postfix passes mail to dspam daemon over lmtp - - dspam is setup for for a single shared group, of which I am currently the only user/trainer - - I train very accurately, and as I am currently the only user, I see all messages that are retrained. Training is done only with the dovecot-antispam plugin for correcting FP/FN, no corpus or inoculation is being used. Statistics for the shared group: global: TP True Positives: 39 TN True Negatives: 5062 FP False Positives: 1 FN False Negatives: 33 SC Spam Corpusfed: 0 NC Nonspam Corpusfed: 0 TL Training Left: 0 SHR Spam Hit Rate 54.17% HSR Ham Strike Rate: 0.02% PPV Positive predictive value: 97.50% OCA Overall Accuracy: 99.34% I think my training policy is OK. I don't have a long list of IgnoreHeaders, but that does not matter to my question at all. However none of this answers my initial question: does dspam_factors represent all data used for classification? And if it does: why would dspam ever decide that the example message was spam (with an astounding confidence)? - -- Tom On 22/04/11 10:36, Ibrahim Harrani wrote: > Hi Tom, > > Which dspam version you are using? How do you train? Which tokenizer > do you use during the train and after train? > Dspam is very sensitive about training. If you don't train very well > or if you train too much you may have troubles. > Also there are many headers you should ignore. You can get the list from: > http://sourceforge.net/apps/mediawiki/dspam/index.php?title=Working_DSPAM%2BPOSTFIX%2BMYSQL%2BCLAMAV_Setup_by_PaulC > > Also if uploaded spam/ham corpus from windows to unix/linux you should > ignore them by adding the following line to dspam.conf. > I had this problem before, In this case dspam was only checking the > headers like for the classification. > > #Specifying 'lineStripping' causes DSPAM to strip ^M's from messages > passed # in. > Broken lineStripping > > If you have same problem you may have to re-train your dspam data. > > Thanks. > > On Fri, Apr 22, 2011 at 9:17 AM, Tom Hendrikx <t...@whyscream.net> wrote: > Hi, > > In my current setup I just received my first FP. Dspam is setup to add > the dspam-factors header to classified e-mails, but after reviewing the > data, I don't understand why dspam decided to classify the message as > spam. Also the X-DSPAM-Improbability header has weird contents. > > Does the dspam_factors header contain all of the tokens used to classify > the message, or only a subset of them? Because the headers in the FP > message do not explain why it happens: > > X-DSPAM-Result: Spam > X-DSPAM-Processed: Fri Apr 22 01:01:29 2011 > X-DSPAM-Confidence: 0.9963 > X-DSPAM-Improbability: 1 in 26939 chance of being ham > X-DSPAM-Probability: 1.0000 > X-DSPAM-Signature: 1,4db0b74991741873512032 > X-DSPAM-Factors: 15, > X-AntiAbuse*Original+#+-, 0.99649, > X-AntiAbuse*Caller+#+GID, 0.99649, > X-AntiAbuse*Sender+#+Domain, 0.99649, > X-AntiAbuse*please+#+it, 0.99649, > X-AntiAbuse*with+#+#+report, 0.99649, > X-AntiAbuse*to+#+abuse, 0.99649, > X-AntiAbuse*Primary+#+-, 0.99649, > X-AntiAbuse*Original+Domain, 0.99649, > X-AntiAbuse*GID+-, 0.99649, > X-AntiAbuse*Sender+#+#+-, 0.99649, > X-AntiAbuse*track+abuse, 0.99649, > X-AntiAbuse*header+was, 0.99649, > X-AntiAbuse*header+#+#+#+track, 0.99649, > X-AntiAbuse*was+#+to, 0.99649, > X-AntiAbuse*Originator+Caller, 0.99649 > > According to the scoring of the listed tokens, I think this message > should be marked as ham, not as spam. Relevant values from dspam.conf: > > TrainingMode teft > ImprobabilityDrive on > Algorithm graham burton > Tokenizer osb > PValue bcr > > All of the above with a git tip checkout from 2011-03-01. > > Kind regards, > > Tom > > > FWIW: I added the X-AntiAbuse header to the Ignmoreheaders after > reviewing this message, because I concluded that the header is pretty > useless for classification. > > >> - ------------------------------------------------------------------------------ Fulfilling the Lean Software Promise Lean software platforms are now widely adopted and the benefits have been demonstrated beyond question. Learn why your peers are replacing JEE containers with lightweight application servers - and what you can gain from the move. http://p.sf.net/sfu/vmware-sfemails _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user >> > ------------------------------------------------------------------------------ > Fulfilling the Lean Software Promise > Lean software platforms are now widely adopted and the benefits have been > demonstrated beyond question. Learn why your peers are replacing JEE > containers with lightweight application servers - and what you can gain > from the move. http://p.sf.net/sfu/vmware-sfemails > _______________________________________________ > Dspam-user mailing list > Dspam-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspam-user - -- New PGP key: 7D54EFF5 Fingerprint: C26F 374F 5E13 157B 5B42 7A1B 93DF 319D 7D54 EFF5 http://www.whyscream.net/key-transition-2011-03-30.txt.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJNsVD4AAoJEJPfMZ19VO/1sCkP/RGJZwVZ9gIOFiYTR1sKfV4q tvDl8L/oOjS13oYc7fvt7YNioceVEGe4MgWE/dWeverrttDO7kVxOWFqbUmaPUz7 9OlLfRpXQWZmV7XwtxFJ+Gk52sOux4By0G/y0BwTl2OlOdpbyzL/aOkH/2rCEwLH UhPDTlEcIhMmAggVWOoF5esGYkIjjOZ2cp7UeyFHqTDRjZvkl9PX3xTCKwdePnW3 9x+1GyhNd/bl+nVY5xuqqqSMcb4qeyFtJ8Nn7bRgyKzB8PYgRmVU+bPXHOna7OIo dG/74SkbIXBcTVZSYbZYFIzw9RzWaKxhBDcE09JzjsQoYanSzkzrDIVl290iCbXY samHB1XhRFgsnnYpMsxECR7QzeqEvdLnhmgtPzZOSLFjzgGjeIQRkIy8oZOtgCt5 jzrgwby/eEl6XggiuJ/gXIBXJmmM23dxbwwaLjgkvZ7iIu2SVGYKGfcW1Xn31RkJ k9VmaUQ4WJGfQd8q7pYBNR52M7nQxvMV+0BUim/C8Eu8zXgtf+FV6bCixWmixxZ5 cgSs59mu0TLZWq48IdWlWstNBMYzfLfO0DUSWdKdO1JdgAy4CvGmlYrqBGVaFPrF Z26Era6cPo+t1ChrvowUPoIwKoyHXf/h/dtrqDnlwk3aD7Gy0fYt4JjaFmUBA+1k Pp4iIHtrkG+PMv0DJnZX =5Hso -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ Fulfilling the Lean Software Promise Lean software platforms are now widely adopted and the benefits have been demonstrated beyond question. Learn why your peers are replacing JEE containers with lightweight application servers - and what you can gain from the move. http://p.sf.net/sfu/vmware-sfemails _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user