On Tue, Mar 29, 2011 at 11:45:39AM +0300, Ibrahim Harrani wrote:
> Hi,
> 
> I am testing git version of dspam with PostgreSQL 9.0 running on
> FreeBSD 8 (Dual core cpu, 4 GB memory)
> 
> I trained dspam with 110K spam and 50K ham mails. Now I have more than
> 7 million entry on dspam.
> 
> dspam=# SELECT count(*) from dspam_token_data ;
>   count
> ---------
>  7075311
> (1 row)
> 
> I vacuum and reindex database regularly.
> 
> When I start the dspam, processing an email tooks 40-50 sec at the
> beginning than drops to 10sec.
> If I made this test with more powerful server(quad core cpu with 16GB
> memory). it takes 0.01 secs.
> I belive that the problem with the small server about large database
> entries. but I would like to get better performance
> on the small server as well. Any idea?
> 
> Do you think that sqlite might be better then pgsql on this setup? or
> did I train dspam with alots of spam/ham?
> 
> Thanks.
> 

Hi Ibrahim,

Are these 7 million tokens for a single user? What tokenizer are you
using: WORD, CHAIN, MARKOV/OSB, MARKOV/SBPH? That seems like an awful
lot of training. The docs usually recommend 2k messages each of ham
and spam. When we generated a base corpus for our user community,
we pruned the resulting millions of tokens down to about 300k. Another
thing that can help is to cluster your data on the uid+token index.
It looks like you cannot keep the full active token pages in memory
with only a 4GB system. Look at your paging/swapping stats. You may
be able to reduce your memory footprint which should help your performance.
Do you have your FILL FACTOR set to allow HOT updates?

Cheers,
Ken

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to