On Tue, Mar 29, 2011 at 05:24:28PM +0300, Ibrahim Harrani wrote: > Hi Kenneth, > > Thanks for your prompt reply. > Yes this is from single user. But I am planning to use this user as a > global that will be managed by admins. > I trained all spam with the same --username. > I change fillfactor to 90 after the training, not at the beginning. > but this did not solve the problem. > > Algorithm graham burton > Tokenizer chain > > What do you suggest about number of traning ham/spam mails. > Does 2K mail enough? I trained dspam with TEFT option. After the > training I switch to TOE in dspam.conf > I would like to reduce database size(currently 600MB) without loosing > spam catch rate. > > > Here is the debug log. As you see there is a 22 second delay between > "pgsql query..." line and BNR pattern. > It seems dspam spends during the database query. >
Okay. I think that at the very least you will need to purge the unuseful tokens from your training corpus. All of the tokens with nearly equal ham/spam counts as well as the tokens with a very small ham/spam count. How did you change the fillfactor? You would need to do an alter table followed by a cluster to rewrite the table and include the freespace required by the fillfactor. Is that how you performed that operation. You could also do a full copy to a new table with the correct fillfactor. One other thing to try would be to use Markov/OSB instead of CHAIN. OSB generates a few more tokens than CHAIN, but it is much more accurate and so you will need fewer tokens to actually identify the ham/spam. Then, instead of simply loading all of your messages at once, load them incrementally and only train if the existing corpus fails to correctly identify the message. Using something like iostat while you are processing should give you an idea of whether you are I/O bound or not. And lastly, make certain that you have "synchronous_commit = off" in your postgresql.conf file. Cheers, Ken ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user