Re: [Dspam-user] Upgrade dspam 3.6.8 to 3.9.0-git

Sebastian Toepfer Fri, 31 Jul 2009 00:32:27 -0700

Hallo Steve,
On Fri, 31 Jul 2009 01:09:18 +0200, "Steve" <[email protected]> wrote:
> -------- Original-Nachricht --------
>> Datum: Thu, 30 Jul 2009 19:01:37 +0200
>> Von: "Sebastian Toepfer" <[email protected]>
>> An: [email protected]
>> Betreff: Re: [Dspam-user] Upgrade dspam 3.6.8 to 3.9.0-git
> 
>> Hello Steve,
>> 
> Hallo Sebastian
> 
> 
>> thanks, my holliday is rescued :)
>>
> Why? What have I written so good to rescue your holiday?
>


Alles - jetzt kann ich die umstellung machen und habe auch noch was von
meinem urlaub :)

[...]
> 
>> >
>> >> to change the tokinzier without retrain for the
>> >> users. Because I use dspam at home and the "user" have train dspam
>> about
>> >> (3)years and the kill me if the must do this again :(
>> >>
>> > If I understand that right you are asking if you could shorten the 
>> > training for the new installation by using old data. Right? Yes! You
>> > can
>> > do that. You could dump or copy the old data and import it on the new 
>> > installation. But if I see that right then you are planing to change
>> > the
>> > tokenizer and changing tokenizer mostly means that old data is
useless.
>> >
>> 
>> bad news ... I've read thats other tokinzier are better,
>>
> Better in what? If it would be so clear which tokenizer is the best then
we
> would probably remove all the others. But it's not that easy. For some
> setups tokenizer A is better then tokenizer B and so on...
> 

No, I don't know which is the best. But found some references to use OSB or
SBPH.

> 
>> why it's not 
>> possiblie to migrate the data from one tokinzier to another? It's a
>> problem 
>> how dspam create this token - it's only one way?
>> 
> Yep. The reason is very easy:
> 1) Not all tokenizers use the same schema/pattern
> 2) There is no chain information saved inside the token
> 3) Computing from normal text to token is easy but way back is hard
> 
> 
> I am now going to explain deeply how the tokenizers do create the
> tokens/patterns. I do that because I hope new users will search the
> mailinglist archives and stop asking over and over the same question. I
> will just show the token generating part. Internally DSPAM uses
algorithms
> for calculating the probability and the confidence factor. I am not going
> to explain the later two parts. Just the token creation. Beside the token
> creations DSPAM uses different weight on the generated tokens depending
> which tokenizer is used. I am as well not going to explain that. I have
> done that already in the past and the info about the weight of the tokens
> inside the tokenizers is explained there. If you need that info then
please
> search the mailinglist and read there more about it.
> 
> 
> So now the technical mambo-jamob. Let me explain:
> --------------------------------------------------
[...]
Thanks for this explaination. It's all clear now.
> 
> 
> 
>> > 3 years of data is all fine and okay but to be honest you will not
>> > loose
>> > much. Just the first days will lead to more training but after a short

>> > time DSPAM will catch up and be very accurate.
>> >
>> 
>> It's a small installation only ca. 30.000 mails in this 3 years ... and 
>> 20.000 own by me :) .. so I think it's take a year to reach current 
>> accurate.
>>
> No way. A year? NEVER! Expect a bunch of corrections (in the 2-digit
area)
> and you would be already easy above 90% or even 95%. Just take something
> like OSB or CHAIN. Don't go with WORD in your case.
> 
See my question to tokinizer, I'll switch it and after this answer. I do it
an remove all training data.
> 
>> Or what do you think how long it takes with this low volume?
>> E.g. 
>> one user has only 700 Ham but 1500Spam (accurance 91.40% - she loves
>> dspam
>> :)).
>> 
> Not much time. Really. And you still could pretrain a merged or
> shared,merged group and speedup the process. You can find SPAM corpi
> everywhere on the net (es gibt sie (fast) wie Sand am Meer).
> 
But which is the best for german user, where one user receive english
newsletter/mailinglist. All I've tested result in bad accurance, I hate
false negatives is the worst thing a spamfilter can do, see gmx .. you must
check all your spams daily to found the newsletter :(. If false negative on
a low level then the user check quarantine once a week/month and all its
okay.
> 
>> >
>> >> any other pitfalls?
>> >>
>> > Not really.
>> >
>> 
>> Very good news.
>> 
> :)
> 
> 
>> >
>> >> I use dspam with mysql as backend and without groups.
>> >>
>> > If you have many users then using groups could help to shorten
training
>> >
>> > time.
>> >
>> 
>> Only 5 user with very different mails. My old solution was a single user

>> spamfilter which result in very very bad accurance. I've found dspam an 
>> surprised how well it works (200 or 300 mails and it rocks)! The
learning
>>
>> with forwarding was a other big hit, beause we use pop3 and how should
we
>>
>> train the filter which run on a gateway?
>> 
> Either with the DSPAM Web UI or directly from within the email client (we
> have plugins for Mozilla Thunderbird, Lotus Notes and Microsoft Outlook
> (and possibly others. Just ask here and I am sure someone has made
> something you could reuse)).
> 
Wasn't a real question (I sagte doch das ich kein englisch schreiben kann
:(). More a feature why use dspam, because the ways to do this, work
out-of-the box :)
Okay setup web-gui is more a ..., but have see that a replacement is in
work/plan.
> 
> 
>> Sebastian 
>> 
> Steve
thanks again,
Sebastian

ps: it's posible to set the replay-to on this mailinglist to:
[email protected]? I click only answer and then only the one
the wrote  the mail are in the to field :(

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] Upgrade dspam 3.6.8 to 3.9.0-git

Reply via email to