According to Max Pyziur:
> I recently built a 3.1.6 RPM for my system and ran the 
> new binaries against our website.  The processing was 
> about 30% slower taking 62 minutes vs ~47.  The site
> has about 38,000 html pages.  The processing was 
> about the same as that of 3.1.5 until about the 
> 18,000th item.

Well, during 3.1.6 development, I found that the HTML parser changes I
introduced in July/August caused an approximately 15% slowdown in htdig
on my 500 page web site.  I felt that slowdown was acceptable due to
the increased reliability of the newer parser.  I wasn't aware of any
degenerative effects as more and more pages were indexed, though.

I'd be interested in seeing some profiling done on htdig to see where
it's spending its time while indexing a fairly large site such as yours.
I suspect that the fact the new parser must allocate a new Dicitionary
object for every HTML tag it parses would account for much of the
slowdown, but I don't see why that would degenerate after indexing a
couple tens of thousands of pages.  It could be a symptom of an as-yet
undiscovered memory leak.  If you're willing to spend some time enabling
profiling and testing htdig in this way, we'd appreciate the feedback.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to