On Tue, Mar 29, 2011 at 11:45:39AM +0300, Ibrahim Harrani wrote: > Hi, > > I am testing git version of dspam with PostgreSQL 9.0 running on > FreeBSD 8 (Dual core cpu, 4 GB memory) > > I trained dspam with 110K spam and 50K ham mails. Now I have more than > 7 million entry on dspam. > > dspam=# SELECT count(*) from dspam_token_data ; > count > --------- > 7075311 > (1 row) > > I vacuum and reindex database regularly. > > When I start the dspam, processing an email tooks 40-50 sec at the > beginning than drops to 10sec. > If I made this test with more powerful server(quad core cpu with 16GB > memory). it takes 0.01 secs. > I belive that the problem with the small server about large database > entries. but I would like to get better performance > on the small server as well. Any idea? > > Do you think that sqlite might be better then pgsql on this setup? or > did I train dspam with alots of spam/ham? > > Thanks. >
Hi Ibrahim, Are these 7 million tokens for a single user? What tokenizer are you using: WORD, CHAIN, MARKOV/OSB, MARKOV/SBPH? That seems like an awful lot of training. The docs usually recommend 2k messages each of ham and spam. When we generated a base corpus for our user community, we pruned the resulting millions of tokens down to about 300k. Another thing that can help is to cluster your data on the uid+token index. It looks like you cannot keep the full active token pages in memory with only a 4GB system. Look at your paging/swapping stats. You may be able to reduce your memory footprint which should help your performance. Do you have your FILL FACTOR set to allow HOT updates? Cheers, Ken ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user