-------- Original-Nachricht --------
> Datum: Tue, 25 Aug 2009 21:33:19 +0200
> Von: Sven Karlsson <[email protected]>
> An: [email protected]
> Betreff: [Dspam-user] high level of missed ham, but all factors at 0.01000

> Hello,
> 
Hello Sven,


> I'm getting a lot (in the range of 5-10 percent, perhaps more) of
> false positives in
> our dspam setup, so I enabled showFactors to figure out whats going on.
> 
> In the example below, all factors are 0.01, but it is still classified as
> spam,
> albeit with a confidence of 0.6.
> 
> 
> X-DSPAM-Confidence: 0.6000
> X-DSPAM-Improbability: 1 in 151 chance of being ham
> X-DSPAM-Probability: 0.0000
> X-DSPAM-Signature: 143,4a942ec0228291108619900
> X-DSPAM-Factors: 27,
>         vingar+se, 0.01000,
>         right, 0.01000,
>         right, 0.01000,
>         Vingar+LTD, 0.01000,
>         Vingar+LTD, 0.01000,
>         type", 0.01000,
>         a+{, 0.01000,
>         VA, 0.01000,
>         (+", 0.01000,
>         LTD+|, 0.01000,
>         LTD+|, 0.01000,
>         bredaste+sortiment, 0.01000,
>         style="font, 0.01000,
>         style="font, 0.01000,
>         X-NS-Message-Id*9AFF3889C111}+hage, 0.01000,
>         Subject*Stora+Vingar, 0.01000,
>         From*Hage Vingar LTD <[email protected]>, 0.01000,
>         X-Mailer*(www.effectivestudios.com), 0.01000,
>         none, 0.01000,
>         none, 0.01000,
>         hur, 0.01000,
>         storgatan, 0.01000,
>         Subject*Vingar, 0.01000,
>         va+Vingars, 0.01000,
>         Received*vingar.se, 0.01000,
>         Received*vingar.se, 0.01000,
>         X-NS-Message-Id*BD74, 0.01000
> 
Uhh.. bad, bad, bad! I see to much HTML tags there. This is sure not DSPAM 
3.9.0. Right?


> Other strangeness: most factors displayed seems to be from the header,
> such as month*day pairs (although not in this example). I would assume
> that the email content would account for better indication of
> ham/spam.
> 
That is sure true but you probably use one of the Bayesian algorithms and they 
only use the most significant tokens (15 tokens and up but not endless up). If 
you want all tokens to be considered then you should use naïve as this would 
process all tokens.


> Even more strangeness: The "improbability drive" shows "1 in 151
> chance of being ham" or "1 in 151 chance of being spam" in 95% of the
> cases (of 2146 examined emails). I would expect a lot more variation
> here. Does this indicate a problem?
> 
YES! Something is not right with the statistical counters. Is that issue only 
on your setup or do you have other users having the same issue?


> The setup scenario is for about 1000 mailboxes, using a global user,
> TOE training and initial corpus of about 5000 manually sorted
> spam/ham. There is a central periodic TOE training done about once a
> week for a sample of all messages, training the globaluser.
> 
I don't understand this. What are you training once a week? New and fresh set 
of HAM/SPAM or the same manually sorted 5000 HAM/SPAM messages?


> Algorithm graham burton
>
AHA! So there we are. That's the reason for the reduced amount of tokens on the 
show factors output. This is btw nothing bad. It's not necessarily needed to 
process all tokens to get a good result.


> PValue graham
> 
Uhh... if you have that in PValue then this must be DSPAM 3.6.8 or less. Am I 
right?


> libmysql_drv storage driver
> 
> Using dspam 3.6.8 shipped with Debian.
> 
Aha. Yes. I was right. DSPAM 3.6.8. Have you considered updating your DSPAM 
setup? 3.8.0 at least. DSPAM 3.6.8 does not offer you much to improve your 
situation you currently are facing.


> Any ideas what could be wrong?
>
Beside the 3.6.8 version of DSPAM? Not much (if at all). From what I see above 
you can't much improve your situation with 3.6.8.


> Any way to debug the factors/tokens?
> 
Debug in what way?


> Best Regards
> 
Steve
-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to