Hi all,
investigating an IMHO too long rebuild runtime issue - I assumed a bug in
the code - I've found that this issue was caused by only a few files in
the corpus. For some of the files it has taken one minute or more to
finish processing them.
This large file processing time not depends on the size of the files, it
only depends on the contents. All of the files are extremly HTML encoded
and contains a very very large number of HTML tags.
For example: I removed 6 files from the corpus and the rebuild time is
decreased from 20 to 10 minutes.
Such a very long processing time could lead in to long time record locks
for Spamdb, Whitelistdb and Redlistdb - which could cause any other thread
to run in to a stuck mode (get locked) and it will peak the CPU usage for
a single core to 100% for a long time.
The next relase will monitor the processing time for every file. The list
of the longest 10 runtimes will be written to the rebuild log for every
corpus folder, if the processing time for a single file is larger than one
second. The file 'rebuilddebug.txt' will contain the complete list of
files with a processing time > 1 second.
Thomas
DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test