According to Joe R. Jah: > The patch applies like a charm; the source compiles without a problem. > Everything runs like a charm;) It seems to me that searching is > noticeably faster too;) however, indexing takes three times as long > as 3.1.6 with Armstrong patch:( I wonder why;-/ > > _________3.1.6-082901 + ssl.4 + Armstrong patch________ > htdig: Start digging: Tue Sep 25 19:43:14 PDT 2001 > htmerge: Start merging: Tue Sep 25 20:09:57 PDT 2001 > htmerge: Total word count: 110412 > htmerge: Total documents: 7279 > htmerge: Total doc db size (in K): 117405 > htnotify: Start notifying: Tue Sep 25 20:12:25 PDT 2001 > htfuzzy: Start fuzzying: Tue Sep 25 20:12:33 PDT 2001 > rundig: end rundig: Tue Sep 25 20:13:35 PDT 2001 > ____________3.1.6-092301 + ssl.4 + this patch__________ > htdig: Start digging: Wed Sep 26 15:44:56 PDT 2001 > htmerge: Start merging: Wed Sep 26 17:10:49 PDT 2001 > htmerge: Total word count: 107762 > htmerge: Total documents: 7095 > htmerge: Total doc db size (in K): 115092 > htnotify: Start notifying: Wed Sep 26 17:13:02 PDT 2001 > htfuzzy: Start fuzzying: Wed Sep 26 17:13:10 PDT 2001 > rundig: end rundig: Wed Sep 26 17:14:16 PDT 2001 > ________________________________________________________
Geoff Hutchison responded: > This tells me that on your system, the rx library is faster than the > system library regex calls. On the other hand, many people cheered > when we switched from the rx library to regex for the Endings fuzzy > generation. (Remember the complaints that German endings took weeks > to generate?) > > This may explain some of the different performance reports with the > 3.2 code as this uses regex calls heavily. > > (Hmm. Maybe the configure test should try benchmarking...) If I recall, though, when Joe did profiling on 3.2, there were millions of unexplained calls to regcomp(), when there really shouldn't have been more than one or a few per document. We never got to the bottom of this. In terms of execution speeds of regex code, I think most of the delays would be due to regcomp(), as regexec() is normally pretty quick. However, the url_rewrite_rules support in 3.1.6 should only result in at most a few calls to regcomp() at the very start, when the HtURLRewriter instance is first created. It shouldn't be continually calling regcomp(), so I don't think it should add an appreciable delay to the digging process, assuming the code is working correctly. I'd be very interested in seeing some profiling done with 3.1.6 on Joe's system. Also, looking at Joe's logs above, I notice a few important differences: - the word counts and document counts are different. Surprisingly, they're smaller for the slower dig, but it indicates you're not digging exactly the same site. - the snapshots are different. Was there really a snapshot on Wed., Aug 29, or is that supposed to be 082601? In either case, there were code changes made to the HTML parser between the two snapshots, so we don't know if part of the problem is that the new parser is slower. It might be useful to compare the timings of the two parser versions. To have a really objective test of the aarmstrong patch vs. Geoff's variation, both should be applied to the same 3.1.6 snapshot, and both tested against the same document set on a quiescent system. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
