On Fri, 29 Jan 2010 08:26:44 +0100 "[email protected]" <[email protected]> wrote:
> our users are able to train dspam, crm114 and SA. > They share the same dateset. > So basically one user could mess up the whole data set for all other users. Is that really something you want? > We use postfix as global MTA, but we dont use it to retraining. (no > special alias) > Postfix acting as an edge MTA. Right? Do you use other things in Postfix? Stuff like SPF, DKIM, SenderID, Milters, Policy Delegation, etc? What would that be? > In order to retrain FP, our customers can move email into 2 imap > folders in their mailbox, one for spam learning, the other for ham > learning. > it feeds 2 special folders on one centralized server from which we can > apply learning scripts. > This script do sa-learn for SA and for DSPAM, it checks email headers > and if dspam is not agree with classification, email is retrained with > command: > /usr/bin/dspam --client --user amavis --class=spam --source=error (or > class=ham of course) > Sounds pretty much to do what the Dovecot Anti-Spam plugin is doing. How do you handle POP users? How do they retrain? > This retraining increase greatly accuracy of the 3 engines. > > Autolearning is more tricky because it will massively rely on > heuristics engine (main scoring) to adjusts statistical engine (SA > bayes, CRM) on the fly. > But i'm agree with you, what's the point to use the 3 statisticals > engine this way. > For SA, it's OK, but for CRM114 and DSPAM, I'm wonder if it's really > clever. > I personally would say that it's not clever. > So I think i will let DSPAM do his job, and continue use his scoring > to balance the others. > As an ISP you should consider using groups in DSPAM and split DSPAM so that every user has his/her own data set. I see a merged group for your scenario. Then you could just train that merged group while leave it up to the user to train his/her data. I only would feed Spam honeypots to the merged group and from time to time I would feed some ham to the merged group. Or maybe setting up a mechanism to feed users outbound mails to his/her data set in order to get bulk ham data. > It's the way it works actually, and I'm really satisfied: accuracy is > great and FP are very low. > My current setup has about 1% spam volume. But I use a Policy Delegation service to block 60% to 80% of inbound mail. Out of the total inbound (excluding the blocked inbound) I have a very, very low FP/FN amount. I have no numbers handy but it's very low (as well a one digit percent number). > And may be I will do the same with CRM114. > > So I will give it a try to dspam plugin at > http://eric.lubow.org/projects/dspam-spamassassin-plugin/ because, if > i'm understand correctly, it can be used to balance scoring more > precisely. > > Thanks for your help on this > Regards, > Tonio > -- Kind Regards from Switzerland, Stevan Bajić ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
