-------- Original-Nachricht -------- > Datum: Sun, 13 Jan 2008 20:10:12 +1000 > Von: Mark Constable <[EMAIL PROTECTED]> > An: [email protected] > Betreff: Re: [dspam-users] Newbie
> On 2008-01-13 07:48 pm, Dudi Goldenberg wrote: > > >I grabbed the source package and rebuilt it but I am > > >still getting the same error above. > > > > I run a lenny/sid combo, no issues. > > Doh, you are right. I think my system may have been picking > up the self compiled CVS version I created previously in > /usr/local/bin. So far so good, seems it is working okay > with my v3.6.8 database. Looks like it'll take me a month > to get up to 99+% at the rate I am going. > A month to get up to 99%? How much mails are you getting on your account per month? What version of DSPAM are you using? Have you considered to do pretraining and/or using groups with DSPAM? I have over here a corpus I use for pretraining. It is asymmetric and has 2'976'942 spam mails and 526'145 ham mails resulting in a total of 3'503'087 mails. I do a special way of training in order to get higher accuracy and low amount of tokens inside the database. The result of my training is that I have after the first pass around 329'550 tokens in the dspam_token_data table and around 15MB of total data resulting from the training. 15MB is not that much and helps me to keep the training data of that group all the time in memory. DSPAM is fast and takes often 0.0x or 0.x seconds to classify a mail. But the storage is the part slowing down DSPAM. However... on a moderate CPU (AMD Athlon 1.2GHz) I do classify in average between 3 to 7 mails per second with one single agent. Including all stages involved: reading mail, starting DSPAM agent, getting statistical data from storage engine (in my case it is MySQL), tokenizing the mail, doing the probability calculation, etc... Anyway... my point is that using the group functionality of DSPAM can help you and your users to get a higher accuracy. In general I have from the point where I enable a new domain and/or a user an accuracy over 99%. And all this with the help of some pretraining and using groups in DSPAM. > # dspam_stats -H [EMAIL PROTECTED] > [EMAIL PROTECTED]: > TP True Positives: 509 > TN True Negatives: 556 > FP False Positives: 1 > FN False Negatives: 187 > SC Spam Corpusfed: 0 > NC Nonspam Corpusfed: 0 > TL Training Left: 1943 > SHR Spam Hit Rate 73.13% > HSR Ham Strike Rate: 0.18% > OCA Overall Accuracy: 85.00% > > --markc > Steve -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
