On Fri, 06 Aug 2010 14:34:06 -0500
Nate Custer <ncus...@hostgator.com> wrote:

> Hello,
> 
Hello Nate,


> I am trying to get a dspam working for a very large setup (3 million+ 
> domains) as such am running into some issues with the dspam database, to 
> handle the rate of email filtering needed for each server, I am 
> currently using the pbxt database engine, with 50 partitions of the 
> dspam_token_data and dspam_signature_data partitioned based on a hash of 
> the uid. This has allowed me to reach throughput levels of 20 emails a 
> second on a 4 core machine with 8 dspam threads.
> 
my test system is a p...@2.8ghz and I am able to classify around 21 messages 
per second using corpus data from a NFS share. This on a MySQL master<->master 
with 5.1.49. Those 21 messages per second are pure classification. Learning is 
not that fast.


> I was wondering what other techniques large installations use to keep 
> the database size under control and database preformance at a very high 
> level.
> 
The tokenizer is important and caching on the storage backend. How much memory 
do you have on those systems? Can you post the changes you have done to my.cnf 
for the caching? And can you post your dspam.conf?


> Thanks very much,
> 
> Nate Custer
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić

------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to