According to Max Pyziur: > I recently built a 3.1.6 RPM for my system and ran the > new binaries against our website. The processing was > about 30% slower taking 62 minutes vs ~47. The site > has about 38,000 html pages. The processing was > about the same as that of 3.1.5 until about the > 18,000th item.
Well, during 3.1.6 development, I found that the HTML parser changes I introduced in July/August caused an approximately 15% slowdown in htdig on my 500 page web site. I felt that slowdown was acceptable due to the increased reliability of the newer parser. I wasn't aware of any degenerative effects as more and more pages were indexed, though. I'd be interested in seeing some profiling done on htdig to see where it's spending its time while indexing a fairly large site such as yours. I suspect that the fact the new parser must allocate a new Dicitionary object for every HTML tag it parses would account for much of the slowdown, but I don't see why that would degenerate after indexing a couple tens of thousands of pages. It could be a symptom of an as-yet undiscovered memory leak. If you're willing to spend some time enabling profiling and testing htdig in this way, we'd appreciate the feedback. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

