Hello,

I'm getting a lot (in the range of 5-10 percent, perhaps more) of
false positives in
our dspam setup, so I enabled showFactors to figure out whats going on.

In the example below, all factors are 0.01, but it is still classified as spam,
albeit with a confidence of 0.6.


X-DSPAM-Confidence: 0.6000
X-DSPAM-Improbability: 1 in 151 chance of being ham
X-DSPAM-Probability: 0.0000
X-DSPAM-Signature: 143,4a942ec0228291108619900
X-DSPAM-Factors: 27,
        vingar+se, 0.01000,
        right, 0.01000,
        right, 0.01000,
        Vingar+LTD, 0.01000,
        Vingar+LTD, 0.01000,
        type", 0.01000,
        a+{, 0.01000,
        VA, 0.01000,
        (+", 0.01000,
        LTD+|, 0.01000,
        LTD+|, 0.01000,
        bredaste+sortiment, 0.01000,
        style="font, 0.01000,
        style="font, 0.01000,
        X-NS-Message-Id*9AFF3889C111}+hage, 0.01000,
        Subject*Stora+Vingar, 0.01000,
        From*Hage Vingar LTD <[email protected]>, 0.01000,
        X-Mailer*(www.effectivestudios.com), 0.01000,
        none, 0.01000,
        none, 0.01000,
        hur, 0.01000,
        storgatan, 0.01000,
        Subject*Vingar, 0.01000,
        va+Vingars, 0.01000,
        Received*vingar.se, 0.01000,
        Received*vingar.se, 0.01000,
        X-NS-Message-Id*BD74, 0.01000


Other strangeness: most factors displayed seems to be from the header,
such as month*day pairs (although not in this example). I would assume
that the email content would account for better indication of
ham/spam.

Even more strangeness: The "improbability drive" shows "1 in 151
chance of being ham" or "1 in 151 chance of being spam" in 95% of the
cases (of 2146 examined emails). I would expect a lot more variation
here. Does this indicate a problem?

The setup scenario is for about 1000 mailboxes, using a global user,
TOE training and initial corpus of about 5000 manually sorted
spam/ham. There is a central periodic TOE training done about once a
week for a sample of all messages, training the globaluser.

Algorithm graham burton
PValue graham

libmysql_drv storage driver

Using dspam 3.6.8 shipped with Debian.

Any ideas what could be wrong?
Any way to debug the factors/tokens?

Best Regards

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to