Hallo Steve, On Fri, 31 Jul 2009 01:09:18 +0200, "Steve" <[email protected]> wrote: > -------- Original-Nachricht -------- >> Datum: Thu, 30 Jul 2009 19:01:37 +0200 >> Von: "Sebastian Toepfer" <[email protected]> >> An: [email protected] >> Betreff: Re: [Dspam-user] Upgrade dspam 3.6.8 to 3.9.0-git > >> Hello Steve, >> > Hallo Sebastian > > >> thanks, my holliday is rescued :) >> > Why? What have I written so good to rescue your holiday? >
Alles - jetzt kann ich die umstellung machen und habe auch noch was von meinem urlaub :) [...] > >> > >> >> to change the tokinzier without retrain for the >> >> users. Because I use dspam at home and the "user" have train dspam >> about >> >> (3)years and the kill me if the must do this again :( >> >> >> > If I understand that right you are asking if you could shorten the >> > training for the new installation by using old data. Right? Yes! You >> > can >> > do that. You could dump or copy the old data and import it on the new >> > installation. But if I see that right then you are planing to change >> > the >> > tokenizer and changing tokenizer mostly means that old data is useless. >> > >> >> bad news ... I've read thats other tokinzier are better, >> > Better in what? If it would be so clear which tokenizer is the best then we > would probably remove all the others. But it's not that easy. For some > setups tokenizer A is better then tokenizer B and so on... > No, I don't know which is the best. But found some references to use OSB or SBPH. > >> why it's not >> possiblie to migrate the data from one tokinzier to another? It's a >> problem >> how dspam create this token - it's only one way? >> > Yep. The reason is very easy: > 1) Not all tokenizers use the same schema/pattern > 2) There is no chain information saved inside the token > 3) Computing from normal text to token is easy but way back is hard > > > I am now going to explain deeply how the tokenizers do create the > tokens/patterns. I do that because I hope new users will search the > mailinglist archives and stop asking over and over the same question. I > will just show the token generating part. Internally DSPAM uses algorithms > for calculating the probability and the confidence factor. I am not going > to explain the later two parts. Just the token creation. Beside the token > creations DSPAM uses different weight on the generated tokens depending > which tokenizer is used. I am as well not going to explain that. I have > done that already in the past and the info about the weight of the tokens > inside the tokenizers is explained there. If you need that info then please > search the mailinglist and read there more about it. > > > So now the technical mambo-jamob. Let me explain: > -------------------------------------------------- [...] Thanks for this explaination. It's all clear now. > > > >> > 3 years of data is all fine and okay but to be honest you will not >> > loose >> > much. Just the first days will lead to more training but after a short >> > time DSPAM will catch up and be very accurate. >> > >> >> It's a small installation only ca. 30.000 mails in this 3 years ... and >> 20.000 own by me :) .. so I think it's take a year to reach current >> accurate. >> > No way. A year? NEVER! Expect a bunch of corrections (in the 2-digit area) > and you would be already easy above 90% or even 95%. Just take something > like OSB or CHAIN. Don't go with WORD in your case. > See my question to tokinizer, I'll switch it and after this answer. I do it an remove all training data. > >> Or what do you think how long it takes with this low volume? >> E.g. >> one user has only 700 Ham but 1500Spam (accurance 91.40% - she loves >> dspam >> :)). >> > Not much time. Really. And you still could pretrain a merged or > shared,merged group and speedup the process. You can find SPAM corpi > everywhere on the net (es gibt sie (fast) wie Sand am Meer). > But which is the best for german user, where one user receive english newsletter/mailinglist. All I've tested result in bad accurance, I hate false negatives is the worst thing a spamfilter can do, see gmx .. you must check all your spams daily to found the newsletter :(. If false negative on a low level then the user check quarantine once a week/month and all its okay. > >> > >> >> any other pitfalls? >> >> >> > Not really. >> > >> >> Very good news. >> > :) > > >> > >> >> I use dspam with mysql as backend and without groups. >> >> >> > If you have many users then using groups could help to shorten training >> > >> > time. >> > >> >> Only 5 user with very different mails. My old solution was a single user >> spamfilter which result in very very bad accurance. I've found dspam an >> surprised how well it works (200 or 300 mails and it rocks)! The learning >> >> with forwarding was a other big hit, beause we use pop3 and how should we >> >> train the filter which run on a gateway? >> > Either with the DSPAM Web UI or directly from within the email client (we > have plugins for Mozilla Thunderbird, Lotus Notes and Microsoft Outlook > (and possibly others. Just ask here and I am sure someone has made > something you could reuse)). > Wasn't a real question (I sagte doch das ich kein englisch schreiben kann :(). More a feature why use dspam, because the ways to do this, work out-of-the box :) Okay setup web-gui is more a ..., but have see that a replacement is in work/plan. > > >> Sebastian >> > Steve thanks again, Sebastian ps: it's posible to set the replay-to on this mailinglist to: [email protected]? I click only answer and then only the one the wrote the mail are in the to field :( ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
