Hi folks,

Further to the issues I posted last week, I have moved my backup server 
from CentOS with perl v5.10 to Ubuntu with perl v5.12 and the threads 
getting stuck issue has cleared up. The problem is therefore either in a 
CentOS config or something to do with perl v5.10 as I have reapplied all 
of my specific configs to Ubuntu. I know there was a link to getting 
perl v5.12 running on CentOS but the general advice from CentOS is to 
not change the base perl version as so much relies on it.

Things seem to be staying up mostly, however I am having problems with 
rebuildspamdb. It takes absolutely ages to run and generates a system 
load of 30-40. It can take up to five minutes just to run su on the 
machine and SMTP connections quite often time out. Current example has 
been running for 52229s. It has gotten through the error folders and the 
spam folder and seems to be chugging away very slowly with the notspam 
folder.

I can't see a way to get any extra information for rebuildspamdb other 
than turning on general debug mode which generates a lot of extra 
information. When I do turn on debug, all I see for Worker_10001 is the 
following, repeated many times with different percentages:

 >2012-02-09 10:26:21 [Worker_10001] <ASSP_WordStem - set_active_languages
 >2012-02-09 10:26:21 [Worker_10001] <ASSP_WordStem - cleanup HTML Tags
 >2012-02-09 10:26:21 [Worker_10001] <ASSP_WordStem - cleanup exception 
words
 >2012-02-09 10:26:21 [Worker_10001] <ASSP_WordStem language detection
 >2012-02-09 10:26:21 [Worker_10001] <language en detected to 35.54 percent
 >2012-02-09 10:26:21 [Worker_10001] <language da detected to 8.57 percent
 >2012-02-09 10:26:21 [Worker_10001] <language fr detected to 8.35 percent
 >2012-02-09 10:26:21 [Worker_10001] <language it detected to 7.81 percent
 >2012-02-09 10:26:21 [Worker_10001] <language ro detected to 7.78 percent
 >2012-02-09 10:26:21 [Worker_10001] <language sv detected to 6.48 percent
 >2012-02-09 10:26:21 [Worker_10001] <language nl detected to 6.22 percent
 >2012-02-09 10:26:21 [Worker_10001] <language es detected to 4.97 percent
 >2012-02-09 10:26:21 [Worker_10001] <language pt detected to 4.56 percent
 >2012-02-09 10:26:21 [Worker_10001] <language de detected to 3.11 percent
 >2012-02-09 10:26:21 [Worker_10001] <language fi detected to 2.78 percent
 >2012-02-09 10:26:21 [Worker_10001] <language tr detected to 1.97 percent
 >2012-02-09 10:26:21 [Worker_10001] <language hu detected to 1.84 percent
 >2012-02-09 10:26:21 [Worker_10001] <ASSP_WordStem start word stemming
 >2012-02-09 10:26:21 [Worker_10001] <ASSP_WordStem process word stem - 
with StopWords cleanup
 >2012-02-09 10:26:21 [Worker_10001] <ASSP_WordStem finished

I presume this is WordStem being used in the generation of Bayes pairs 
but have no idea how to make it cause less load. CPU load on the server 
is actually quite low so I suspect I/O issues.

If I turn off the wordstem module then rebuildspamdb runs as it used to. 
Higher CPU usage but a load between one and two and it gets through the 
corpus much, much quicker.

So, any ideas on how to make Wordstem not be a resource hog?

All the best,
Colin.



------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to