Hi all,

investigating an IMHO too long rebuild runtime issue - I assumed a bug in 
the code - I've found that this issue was caused by only a few files in 
the corpus. For some of the files it has taken one minute or more to 
finish processing them.
This large file processing time not depends on the size of the files, it 
only depends on the contents. All of the files are extremly HTML encoded 
and contains a very very large number of HTML tags.
For example: I removed 6 files from the corpus and the rebuild time is 
decreased from 20 to 10 minutes.

Such a very long processing time could lead in to long time record locks 
for Spamdb, Whitelistdb and Redlistdb - which could cause any other thread 
to run in to a stuck mode (get locked) and it will peak the CPU usage for 
a single core to 100% for a long time.

The next relase will monitor the processing time for every file. The list 
of the longest 10 runtimes will be written to the rebuild log for every 
corpus folder, if the processing time for a single file is larger than one 
second.  The file 'rebuilddebug.txt' will contain the complete list of 
files with a processing time > 1 second.


Thomas
 


DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to