According to Christopher Murtagh: > On Wed, 2004-04-21 at 09:13, Lachlan Andrew wrote: > > Yes, I agree that we need a more "polished" patch for the > > distribution. I still like my intermediate path: If *any* server > > blocks or URL blocks are used, then the user takes the performance > > hit and re-parses each time. > > That sounds like a decent plan to me. However, 'performance hit' is a > serious understatement with the current code. Without my patch, my Dual > 3GHz Xeon had one CPU pegged at 100% for 8 hours and still only managed > to index a little under 1000 pages per hour. Once I stopped re-parsing > the URL list at for each URL, the CPU usage was down to about 6-10%, > essentially making it an I/O bound problem rather than CPU. > > I suspect, as Gilles suggests that there are probably other > optimizations in the Regex code that could help out a lot in this > matter.
Yes, what I initially had in mind was pretty simple. The Regex object already stores the compiled pattern for the last value used. I'd just add to that object the string that was compiled to get that binary pattern, so that we could avoid repeatedly compiling the same pattern. That, in itself, would be a trivial addition. It wouldn't be a full caching scheme with multiple patterns stored, so we don't need to worry about cache flushing. However, it occurs to me that this fix wouldn't be a huge help to users who do use exclude_urls or bad_querystr within URL or server blocks. The reason for this is that htdig will alternate between servers, to distribute the load, so when it's used a pattern within an URL or server block it will move quickly on to another server, causing htdig to have to go to another pattern -- forcing a recompile of the pattern. Still, it would likely be an improvement over what it does now, for the vast majority of users. > > If *no* server/URL blocks are used, we use Chris's patch. This should > > be just as fast as Chris's patch (in the "3.1-compatibly mode" without > > server/URL blocks), and just as flexible as the current status > > (if blocks are used). > > I can't believe it took me until today to find the block configuration > in the documentation. Once I found it, it seems to be in an obvious > place, but perhaps it needs a mention as a feature in the front page of > the docs? Yeah, it's not very prominent in the docs right now, and it ought to be made more obvious. Most users of 3.2 don't even know about this feature. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev