On Fri, 29 Jan 2010 08:26:44 +0100
"[email protected]" <[email protected]> wrote:

> our users are able to train dspam, crm114 and SA.
> They share the same dateset.
>
So basically one user could mess up the whole data set for all other users. Is 
that really something you want?


> We use postfix as global MTA, but we dont use it to retraining. (no
> special alias)
>
Postfix acting as an edge MTA. Right? Do you use other things in Postfix? Stuff 
like SPF, DKIM, SenderID, Milters, Policy Delegation, etc? What would that be?


> In order to retrain FP, our customers can move email into 2 imap
> folders in their mailbox, one for spam learning, the other for ham
> learning.
> it feeds 2 special folders on one centralized server from which we can
> apply learning scripts.
> This script do sa-learn for SA and for DSPAM, it checks email headers
> and if dspam is not agree with classification, email is retrained with
> command:
> /usr/bin/dspam --client --user amavis --class=spam --source=error  (or
> class=ham of course)
> 
Sounds pretty much to do what the Dovecot Anti-Spam plugin is doing. How do you 
handle POP users? How do they retrain?


> This retraining increase greatly accuracy of the 3 engines.
> 
> Autolearning is more tricky because it will massively rely on
> heuristics engine (main scoring) to adjusts statistical engine (SA
> bayes, CRM) on the fly.
> But i'm agree with you, what's the point to  use the 3 statisticals
> engine this way.
> For SA, it's OK, but for CRM114 and DSPAM, I'm wonder if it's really
> clever.
> 
I personally would say that it's not clever.


> So I think i will let DSPAM do his job, and continue use his scoring
> to balance the others.
>
As an ISP you should consider using groups in DSPAM and split DSPAM so that 
every user has his/her own data set. I see a merged group for your scenario. 
Then you could just train that merged group while leave it up to the user to 
train his/her data. I only would feed Spam honeypots to the merged group and 
from time to time I would feed some ham to the merged group. Or maybe setting 
up a mechanism to feed users outbound mails to his/her data set in order to get 
bulk ham data.


> It's the way it works actually, and I'm really satisfied: accuracy is
> great and FP are very low.
> 
My current setup has about 1% spam volume. But I use a Policy Delegation 
service to block 60% to 80% of inbound mail. Out of the total inbound 
(excluding the blocked inbound) I have a very, very low FP/FN amount. I have no 
numbers handy but it's very low (as well a one digit percent number).


> And may be I will do the same with CRM114.
> 
> So I will give it a try to dspam plugin at
> http://eric.lubow.org/projects/dspam-spamassassin-plugin/  because, if
> i'm understand correctly, it can be used to balance scoring more
> precisely.
> 
> Thanks for your help on this
> Regards,
> Tonio
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to