Hello, Thanks for the advice, notrain might be a really good trick. I think I want our users to be able to setup customer training per email address. I work for Hostgator.com, can't say a ton more then that, but we do host a ton of domains. Our last newsletter (blog.hostgator.com) has some statistics on the current size.
The plan is to use a cluster of dspam boxes as email filters. Email delivery is still going to be handled on the 2,000+ local hosting boxes. As far as code contributions, I am afraid I am not much of a coder and a total c-novice. So I would not expect much there, at least not anytime soon. Here are the things I have developed and would be happy to document at some point: 1) Some docs on using mysql partitioning to improve the interactive performance of dspam for large systems. 2) A rewrite of the maintenance sql that works on a per uid basis, so each query is much faster / nicer to the system. 3) Assuming I can work out all of the kinks, some documentation on using and tuning pbxt for better overall mysql preformance using dspam. 4) Some simple spamassassin rules that convert the dspam result headers to spam assassin spam scores. 5) Some documents on the specific exim ACLs and rules used for cpanel systems to reduce the amount of spam needing to be scanned. Currently I have: a) a quick exim ACL that skips email generated locally on the server. b) a exim acl that does not scan email sent to a catchall. c) an exim acl to use dnswl.org and not scan email from trusted whitelisted senders. am cc:ing this back to the list to hear if any of the tricks I have would be useful to document for a wider audience. Nate Custer Hostgator.com Imposit.com - webmaster wrote: > Hmm as a quick idea i would say > trainmode none > and work with merged groups. > > and do offline training. > thats way you reduce massive the ammount of data. > instead of 3 million (what the hell youre working for? godaddy?) > datasets you get much less. > > of course i dont think you cant use the same dataset for every user but maybe > you find some things in common for groups. > like go for country or business branch in your database - extract their > account names and build groups with them > > but if youre so big maybe you want develop a better extension for groups > based on databases instead of files :-) > code contributions welcome hehe > > anyway at this size, i would go for no train, tokenizer osb and a long talk > with stevan > > but you know you can network dspam instances? > > how you wanna setup ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user