Hello,

Thanks for the advice, notrain might be a really good trick. I think I 
want our users to be able to setup customer training per email address. 
I work for Hostgator.com, can't say a ton more then that, but we do host 
a ton of domains. Our last newsletter (blog.hostgator.com) has some 
statistics on the current size.

The plan is to use a cluster of dspam boxes as email filters. Email 
delivery is still going to be handled on the 2,000+ local hosting boxes.

As far as code contributions, I am afraid I am not much of a coder and a 
total c-novice. So I would not expect much there, at least not anytime 
soon. Here are the things I have developed and would be happy to 
document at some point:

1) Some docs on using mysql partitioning to improve the interactive 
performance of dspam for large systems.
2) A rewrite of the maintenance sql that works on a per uid basis, so 
each query is much faster / nicer to the system.
3) Assuming I can work out all of the kinks, some documentation on using 
and tuning pbxt for better overall mysql preformance using dspam.
4) Some simple spamassassin rules that convert the dspam result headers 
to spam assassin spam scores.
5) Some documents on the specific exim ACLs and rules used for cpanel 
systems to reduce the amount of spam needing to be scanned. Currently I 
have:

a) a quick exim ACL that skips email generated locally on the server.
b) a exim acl that does not scan email sent to a catchall.
c) an exim acl to use dnswl.org and not scan email from trusted 
whitelisted senders.

am cc:ing this back to the list to hear if any of the tricks I have 
would be useful to document for a wider audience.

Nate Custer
Hostgator.com

Imposit.com - webmaster wrote:
> Hmm as a quick idea i would say
> trainmode none 
> and work with merged groups.
>
> and do offline training.
> thats way you reduce massive the ammount of data.
> instead of 3 million (what the hell youre working for? godaddy?)
> datasets you get much less.
>
> of course i dont think you cant use the same dataset for every user but maybe 
> you find some things in common for groups.
> like go for country or business branch in your database - extract their 
> account names and build groups with them
>
> but if youre so big maybe you want develop a better extension for groups 
> based on databases instead of files :-) 
> code contributions welcome hehe
>
> anyway at this size, i would go for no train, tokenizer osb and a long talk 
> with stevan 
>
> but you know you can network dspam instances?
>
> how you wanna setup


------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to